DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

Hisat-genotype run since more than 24h without writting anything or do anything... #81

Open GKerdivel opened 3 months ago

GKerdivel commented 3 months ago

Hi I am desesperately trying to make hisat-genotype work since week but without success. The best I got is the software starting (big success already) but it get stuck at the beginning...

Here is my command Line:

hisatgenotype -x genotype_genome --base hla --locus-list A -1 myfastq_R1.fastq.gz -2 myfastq_R2.fastq.gz --assembly -p 40 --pp 40 Here is the what happens in the terminal:

Files found: Omitted extracting reads from myfastq_R1.fastq.gz
 A

The terminal is still in use since 24h but nothing happenned since then...

Two files are created :

Checking the processes, nothing related to hisat-genotype seems to appear and no resources are used...

I would realy appreciate to get some help making this work...

Thanks in advance

Gwenneg

DarioMarzella commented 3 months ago

Hi, I am not one of the developers but I am also trying to get this software running... For what I understand, it might be that you are requesting way too many cores/threads. You are requesting 40 threads (-p 40) and, for each of those, 40 cores/threads (-pp 40)? It's not really clear to me how the parallelization works for this software, but it might be that you are actually requesting 40*40 cores (for a total of 1600 cores). Unless you are using a very powerful HPC that allows you to access those many cores without you specifying it, I assume what is happening is that the software tries to spawn way too many processes, causing it to idle for forever. Try simply removing the -pp 40 and see what happens. Also maybe try it with their test data from the ["typing and assembly" section here](Typing and Assembly), which seems to be quite small, so should allow you to run a quick test. Also make sure you do have access to 40 cores in the computer/cluster you are using. Hope this helps!

GKerdivel commented 3 months ago

Hi, I am not one of the developers but I am also trying to get this software running... For what I understand, it might be that you are requesting way too many cores/threads. You are requesting 40 threads (-p 40) and, for each of those, 40 cores/threads (-pp 40)? It's not really clear to me how the parallelization works for this software, but it might be that you are actually requesting 40*40 cores (for a total of 1600 cores). Unless you are using a very powerful HPC that allows you to access those many cores without you specifying it, I assume what is happening is that the software tries to spawn way too many processes, causing it to idle for forever. Try simply removing the -pp 40 and see what happens. Also maybe try it with their test data from the ["typing and assembly" section here](Typing and Assembly), which seems to be quite small, so should allow you to run a quick test. Also make sure you do have access to 40 cores in the computer/cluster you are using. Hope this helps!

Thanks for your answer @DarioMarzella . I really wonder if anyone ever saw this tool working ^^ I do have more than 40 core. In fact I tried with the defaults at first but it was getting stuck as well so I thought maybe it is just slow so I tried to boost it up. Started with the -p options but I saw some posts from other issues mentionning that the -p option was not used at the first stages of the pipeline where I get stuck so I tried the --pp option as suggested in theses posts... I must say I lost hope but I will try indeed with there test data at least ^^ What errors/problems do you have?

DarioMarzella commented 3 months ago

Please try using either only -p or -pp, because if you use them like this I think you are requesting 1600 cores, which I am not sure you have available.

I will post my issue in a separate thread maybe, also because I managed to find a solution. Simply, the hisat2 folder was empty (as hisat2 is no more in this repo), so I had to install it separately and then move it within the hisat2 folder in hisatgenotype, and it's owrking now (yes, it is actually working, quite impressive).

GKerdivel commented 3 months ago

Yes indeed it can be usefull to create a separate thread for your issue, it can help people. As I mentionned I already tried we either -p, --pp, or nothing with the same results... I just launched it with the test filesa nd it worked though... I think maybe my fastq files are just too big or something... What kind of data did you use?

DarioMarzella commented 3 months ago

I actually might be experiencing now your same issue. Previously I used some WES files which were not too big and it worked flawlessly. Now I am trying to use some WGS file (roughly 180GB per read direction, so 360GB in total) and indeed looks like Hisat-genotype stalls for forever, barely using one core althugh I provided 64 threads and just doing seamingly nothing until the walltime is reached.