faircloth-lab / phyluce

software for UCE (and general) phylogenomics
http://phyluce.readthedocs.org/
Other
76 stars 48 forks source link

stampy: Error: Overfull hash table #324

Open Ruben9161 opened 7 months ago

Ruben9161 commented 7 months ago

Dear Brant,

Im currently working on the phyluce tutorial IV: "Identifying UCE Loci and Designing Baits To Target Them" with data of my own, instead of the coleopterans genomes used for the tutorial.

We were able to move forward with the protocol until the section "Prepare the base genome". We ran the comand line: stampy.py -g vaVi1 -H vaVi1, but we got the following error messages:

stampy.py -g vaVi1 -H vaVi1 stampy: Building genome . . . stampy: Imput files ['vaVi1.fatsa'] stampy: Done stampy: Building has table . . . stampy: Initializing . . . stampy: Counting . . . stampy: Initializing hash . . . stampy: Flagging high counts . . . stampy: Creating hash . . . stampy: Error: has reached fill factor 0.9827364992 \ aborting stampy: Suggest increasing stribe or hash table

stampy: Error: Overfull hash table

We search on github and different blogs but we weren't able to find how to increase the stribe or the hash table. Is there any command or option to increase it/them?

Thanks for your help, Rubén.

brantfaircloth commented 7 months ago

Do you have access to a machine with more RAM - that’s the only solution that I can think of…

jovana03 commented 1 month ago

Hi.

I had a similar problem and I realized that it is no always RAM but the limits of software (i run everything in a cluster with good RAM sources). Following the bait design tutorial, when running stampy.py -g YOUR-BASE-GENOME -H YOUR-BASE-GENOME with big genomes (in this example aprox. 6G) you get something like:

Genome too large (6304282557 entries in image; max is 5368709120)

Then I tried with a genome of aprox 3.6G but the same (the last one make no sense to me because as far as I know, that size is within the limits of stampy), but anyways i got the same error.

I was trying to find a solution at least for the 3.6G one, for example, split my file into smaller pieces but I could not make it work so at the very end I had to use an even smaller genome (2.3G).

Something similar (software limits) hapens when running phyluce_probe_run_multiple_lastzs_sqlite using big genomes (my 8.3 and 8.6 genomes did not work). I already opened an issue about it.

Best, Jovana.