Closed sivico26 closed 3 years ago
Hi,
It sounds like some of the threads may have crashed and python is waiting for them to finish, but can't because of out of memory issues. By the way, minimap2 paf files seem to take up less ram when running HapSolo. One thing to check is to see how many lines are in your psl file and how large is the file? can you do a ls -lh *.psl
Also, how many contigs are in your contig assembly, and what is the current N50? It sounds like HapSolo is being overwhelmed by the number of alignment hits it's getting. Did you do a minimum length filter also?
With respect to the number of iterations, it is currently set to number of iterations per threads. I could add an option say -N that would take the total number of iterations and divide it by the total number of threads and take its ceiling. I understand now that perhaps -n
Thank you,
Edwin
With respect to the CostFxnDelta, its the difference of the current costfxn to the previous costfxn. It is used to check to see if it's on a plateau, in which it then randomizes the alignment parameters and restarts its search at that new randomized set.
Hi there, here are some summary statistics of my assembly generated with abyss-fac:
n n:500 n:1000 n:2000 L50 min N75 N50 N25 E-size max sum name
13713 12358 11490 10722 1428 247 55625 112275 213463 168686 1627529 594.3e6 masurca.fasta
I did not filter the assembly. Although it is quite fragmented, I do not consider the overwhelming is due to a not so great number of small sequences. Instead, I think the issue might be its high repeat content (~ 47%) and the homoeologs present (it is an allotetraploid genome with mainly AABB heterozygous structure). What do you think?
The number of psl files is 13714 as expected (13713 sequences + 1 which merges them all).
Regarding the iteration threads, should I then reduce the iterations to, say 100, to get 2400 if I am using 24 threads?
So is it normal that CostFxnDelta is always 0? because according to your explanation, that would mean it randomized the parameters at every iteration, is that correct?
If the alignments are the problem I might try minimap2 to see if we see a difference here.
Okay, so I ran Hapsolo using .paf
input. For the record I used the second minimap2 line to make the alignment
minimap2 -t 36 -P -k19 -w2 -A1 -B2 -O1,6 -E2,1 -s200 -z200 -N50 --min-occ-floor=100 ${QRY} ${QRY} > $(basename ${QRY} .fasta)_self_align.paf
The thing is, I got the following error:
Traceback (most recent call last):
File "/home/svillanu/Programs/HapSolo/hapsolo.py", line 642, in <module>
mypddf = CreateMM2AlignmentDataStructure(pafalignmentfile)
File "/home/svillanu/Programs/HapSolo/hapsolo.py", line 485, in CreateMM2AlignmentDataStructure
'chainingscore', 'secondchainingscore', 'approxdivergence', 'lqrhrepseeds'])
File "/home/svillanu/.conda/envs/Hapsolo/lib/python2.7/site-packages/pandas/core/frame.py", line 369, in __init__
arrays, columns = _to_arrays(data, columns, dtype=dtype)
File "/home/svillanu/.conda/envs/Hapsolo/lib/python2.7/site-packages/pandas/core/frame.py", line 6284, in _to_arrays
dtype=dtype)
File "/home/svillanu/.conda/envs/Hapsolo/lib/python2.7/site-packages/pandas/core/frame.py", line 6363, in _list_to_arrays
coerce_float=coerce_float)
File "/home/svillanu/.conda/envs/Hapsolo/lib/python2.7/site-packages/pandas/core/frame.py", line 6420, in _convert_object_array
'columns' % (len(columns), len(content)))
AssertionError: 18 columns passed, passed data had 17 columns
Digging about it I found that the produced .paf
file has 17 columns, while the CreateMM2AlignmentDataStructure function is expecting 18, which explains the issue.
Then I wondered about the discrepancy and how to solve it. So I checked both the PAF specification and the columns CreateMM2AlignmentDataStructure is expecting. Apparently, the flag option -P
in the minimap2 is the root of this, per documentation:
-P
: Retain all chains and don’t attempt to set primary chains. Options -p and -N have no effect when this option is in use.
What I think is happening is that as Minimap2 is not setting primary chains, a differentiation between s1 and s2 stop making sense and minimap2 does not report s2 as a result. At least that happened when I ran minimap2 with the aforementioned line in and my genome. There is no column with s2:
, making the file of 17 columns which makes that CreateMM2AlignmentDataStructure does not accept the .paf
file.
Did minimap2 produce 18 columns when you run it using -P
? I imagine that it did since it is in the README, but I do not see exactly how this varies from genome to genome.
What do you think? I am missing something?
PD: I am currently running Hapsolo with the .paf
produced by the first recommended minimap2 line since it produced 18 columns.
I see. I will look into it then.
I am starting to feel that I am annoying with the issues, sorry about that. Continuing with the previous, step.
I got this error when running Hapsolo with the .paf
file with 18 columns:
Traceback (most recent call last):
File "/home/svillanu/Programs/HapSolo/hapsolo.py", line 691, in <module>
mybestnscoreslist[i][4][1]) + '_primary.fasta'
File "/home/svillanu/Programs/HapSolo/hapsolo.py", line 412, in CalculateInverseProportion
inversePct = exp(-1.0 * log(myPct, 2))
ValueError: math domain error
Apparently, the error is reproducible. I run it twice and on both occasions emerged at iteration 31. I tried to follow the code to see why the function got a 0 as input into that log, but I did not succeed, to be honest.
Thanks for your help Edwin.
Hi,
No worries you are not annoying at all. I am happy to help and apologize for the late reply. I have had to take our doggie to the vet for a mass growth in his gums and today took the little one to archery practice outdoors.
With respect to the large amount of RAM being used, it does not appear to be due to the size of the PSF file, but perhaps I could take a look at it, if you compress it into a gz file.
It also appears you have found a bug, where HapSolo is generating a value of 0.0. I will look into this and correct it in the code. Also, HapSolo doesn't always report a costfxdelta of 0, that is only when it encounters a plateau, when it does not encounter a plateau it increments forward in the positive direction. With respect to the number of iterations, you could run 100 just to test the run but I would recommend a total of 50,000 iterations, if you are running 24 cores, then perhaps 2,000 iterations per thread would be good for an actual run. I will perform some test cases and make the changes and push them to github. Please give me a few hours. I will let you know when I have fixed the error.
Hi,
Please try running HapSolo once more. I have gone ahead and updated both branches to no longer pass values of 0.0 for PID, Q and QR during the initial hill climbing step. I have also fixed the error where 18 columns are always expected from the PAF file. Please let me know if you are still having issues.
Thank you for letting me know about these bugs. Your help and feedback is absolutely appreciated! 👍
Thank you again,
Edwin
Hi,
I hope all is well. I was wondering if you were still having the same issues.
Please let me know.
Thank you,
Edwin
Hi Edwin, thanks for your reply. These have been busy days, so I have not had enough time for this particular task.
As a preliminary report, I changed the minimap2 line to the third one recommended and the job is currently running. I hope to have news tomorrow.
Best
Hi,
No worries. Either line should work now as it now takes both number of columns generated by minimap2. Looking forwar to the news, bad or good. It's good to see that you caught a few bugs as it allows me to fix them, and ultimately have a better software package.
Thank you again for your help and feedback.
Edwin
Okay so here is what I got:
When I use the first minimap2 line to generate the .paf
file:
minimap2 -t 36 -k19 -w5 -A1 -B2 -O3,13 -E2,1 -s200 -z200 -N50 --min-occ-floor=100 ${QRY} ${QRY} > $(basename ${QRY} .fasta)_self_align.paf
The result is really small. When I run HapSolo using this file I still get ValueError: math domain error
as before (seemingly in the same place, btw). I do not know if this is to worry as the file seems to be wrong in the first place. I attach it in case you find it useful.
Now, the other two suggested minimap2 lines produce the expected .paf
files (~14 Gb uncompressed and 3.1 Gb gzipped) and therefore I am unable to share them. When I run Hapsolo with either of these, I return to the original topic of this issue: the process runs out of memory and eventually gets zombie dead.
Just out of curiosity, what do you think is driving such high memory consumptions?
I will try to filter out some small sequences and try again to see if I can avoid de memory overflow.
Wanted to add that after filtering sequences shorter than 2000 bp (which removed around 3000 seqs.) the size of the .paf
is basically the same (14 Gb uncompressed, 3.1 Gb gzipped). Thus, it is not a surprise that with filtering the problem persists, HapSolo runs out of memory and the process gets zombie dead.
Hi,
Would it be possible to upload your input files here? I would like to test your run and see where these errors are originating from. I don't see why you would still get a math value error as I have eliminated the possibility of giving the log function a value of 0....
https://drive.google.com/drive/folders/1j_k31X-UqIDXgR_UwvX8IyOjUyhkrK_y?usp=sharing
If possible, please change the share settings to only you and I after accessing the link.
On Sat, Aug 15, 2020, 9:06 AM Simón Villanueva notifications@github.com wrote:
Wanted to add that after filtering sequences shorter than 2000 bp (which remover around 3000 seqs.) the size of the .paf is basically the same (14 Gb uncompressed, 3.1 Gb gzipped). Thus it is not a surprise that with filtering the problem persists, HapSolo runs out of memory and the process gets zombie dead.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/esolares/HapSolo/issues/3#issuecomment-674417132, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBDVPTHM7FTHER6ALEDOBDSA2W43ANCNFSM4PX57O2A .
I can share the .paf
and Busco files, but since this is unpublished data I'm afraid I can not share the assembly itself. Regarding the Busco files. The folder is huge (More than 200 Gb when I checked yesterday), which files does Hapsolo uses? Looking at the code I saw that the full_report.tsv is needed, but want to be sure what else is needed before filter and packaging.
Hi,
You can run tar czvf buscos.tar.gz contigs/busco/*/*/full_table*.tsv
and upload that. Also you can gzip the paf file. With respect to the assembly you could just send the contig names and lengths by running bioawk -cfastx '{print $name"\t"len($seq)}' $myasm > $myasm.lengths
I can just generate a random set of sequences to mimic the size of your assembly.
Are you able to change the share settings so that only you and I can see the shared folder?
Thank you,
Edwin
Apparently changing the permissions I excluded my self without uploading the files hahaha. I already prepared the files and requested access again.
Thanks Edwin
All right, files uploaded.
The small .paf
is the one that produces the math error, while the large one produces zombie state.
Hi,
Thank you for uploading the files so I can further investigate what is going on. I am currently running it on the smaller paf file. so far I am using 1GB fo ram and all threads are at about iteration 80ish. I have also noticed that the scores are all at 50,000,000.0. Which means it is not finding any contigs with a single complete BUSCO in it after purging. The default value is set, when single BUSCO's are 0, since the cost function contains single BUSCO's in the denominator. This is done in order to avoid a divided by 0 error. I have yet though to get the math error. I will continue running it and if I still do not get the error, I will run it again. I did a fresh pull from the minimap2 branch to make sure all new updates are incorporated.
I did some digging and found that although your assembly has 13713, only 1661 of those contigs contain complete BUSCO's. This is most likely the problem as most contigs (>87.8%) do not contain a complete BUSCO's. Is it possible some of the BUSCO runs could have crashed? Is it possible there could be some other issue in your assembly? Have you polished your assembly already? My suspicion is either a majority of the BUSCO runs failed, or there was an issue with the parameters of the BUSCO run.
Please let me know your thoughts and also if you find anything.
Thank you,
Edwin
Hi,
I have identified another bug, where the contig name was being pulled from the name of the BUSCO full table file. When a different naming convention is used from what is recommended in the bash script I provided, HapSolo will incorrectly designate the wrong contig name in the data structure. I believe I have patched that bug and will be testing it now. I still am however getting no single BUSCO's when I run HapSolo with the updated code. I will investigate further and update you accordingly. Thank you for helping me catch another bug. Hopefully, I will be able to solve all the issues with your run of HapSolo.
Will keep you posted.
Edwin
Hi Edwin,
Thanks for the speed in evaluating this case. Very interesting findings. I also was puzzled by the way the contig names were loaded from the Buscos and I was about to ask. Not sure why I did not mention it.
Regarding the Buscos of the assembly, the global value I get is 96.4% or 1937 complete in eudicotiledons_odb10 dataset. I will take the individual reports and check If I got the same result and will return to you.
Hi,
Yes, I assumed everyone would use my preprocessing scripts, but even then, i'm sure this bug would have come up eventually, as things always can change. This time I extracted it directly from the run command on the 3rd line of every full table file. I have also fixed the bugs that arose from it. I am running it again now on 10 cores and 5000 iterations and I am now getting scores. It does however appear that the score isn't decreasing much so far, but then again it's only on it's 30th iteration. I'll continue running it and let you know.
With respect to the massive amount of memory being used on the larger file, I have some ideas on how to reduce memory usage. I will tackle that after I can get a successful run from the smaller file.
I will keep you posted. Also, please let me know what you find with regards to the number of contigs that contain BUSCO's. In the end it should be fine? only because HapSolo uses contigs that contain BUSCO's to train on and find optimal purging parameters which are applied assembly wide.
Thank you,
Edwin
Hi Edwin,
Really nice to see the workarounds. As a curiosity, don't you think the contig names can be extracted from the assemblies instead of the Busco files? Or do you have a reason to do it from the Busco files? Is it more robust?
Regarding the Global Busco of the assembly, this is what I found:
In extended Busco categories (defined by me haha):
Complete single: 1537 (72.47)
Duplicated: 429 (20.23)
Fragmented: 26 (1.23)
Comp_fragmented: 44 (2.07)
Dup_fragmented: 53 (2.5)
Mul_fragmented: 4 (0.19)
Missing: 28 (1.32)
I shall mention that Comp_fragmented means that the gene is found complete in a contig and fragmented at least once in another. Similarly, Dup_fragmented means found Duplicated in some contigs and fragmented in others. Finally, Mul_fragmented is just that the gene was found fragmented multiple times.
In traditional Busco categories:
Complete single: 1581 (74.54)
Duplicated: 482 (22.73)
Total complete: 2063 (97.27)
Fragmented: 30 (1.41)
Missing: 28 (1.32)
The results are actually better than I expected regarding completeness, although the duplicates skyrocketed. I did not imagine the difference between global Busco and sequence by sequence. However, I have to conclude the Busco files you are working on are fine.
The best explanation I have for your observation (most contigs with no Buscos) is the biological one, regions with coding genes tend to be easier to assemble so we get longer contigs and hence tend to be in a few contigs. I think that also means that most of my contigs broke in repetitive sequences with no coding genes near.
I hope it helps.
Hi,
I have the results for the smaller alignment file.
Writing masurca_1000_0.3082_0.8118to1.3509_0.8784_primary.fasta with score: 0.315059861374
Writing masurca_1000_0.2669_0.2799to6.2778_0.8118_primary.fasta with score: 0.315059861374
Writing masurca_1000_0.7887_0.6800to1.7444_0.8692_primary.fasta with score: 0.315059861374
Writing masurca_1000_0.8060_0.9600to1.0607_0.4431_primary.fasta with score: 0.315059861374
Writing masurca_1000_0.5949_0.4370to3.3012_0.2388_primary.fasta with score: 0.315059861374
What is concerning is the wide range of parameters. This is probably due to a low number of BUSCO hits in over 87% of the contigs. As I thought earlier, it seems to not have enough data to train on for the model to properly classify althaps...
I will look into implementing optimizations to increase speed since it took 43279 seconds or about 12 hours. RAM usage was actually pretty good in the MB's Maximum resident set size (kbytes): 112968
I will try the larger file and see it's memory usage.
Hi again,
I was able to successfully load the larger paf file. I have gone ahead and made some major RAM efficiency changes with a more than 50% reduction in RAM usage. I saw initially it was approaching >120GB of RAM which is the maximum amount of RAM available on my in house test server. I currently have RAM usage down to 62.5GB. Still hefty but I hope to reduce this even further with more updates. I will update the github mm2 branch soon and keep you posted.
The min score again was 0.315059861374. I believe this is again due to the lack of BUSCO's in the >87% of the contigs. With the mosquito example from the paper, I saw that ~50% of the contigs contained BUSCO's. I will however have to update HapSolo to support the new changes in the latest BUSCO software. It currently supports the older version of BUSCO as stated in the paper.
The best result so far has been a removal of 4,813,394 nt's from the assembly.
Thank you,
Edwin
Hi Edwin, thanks for keeping me posted.
Really nice results indeed especially the RAM memory savings, I think it makes the program more usable for future users. Regarding the results so far, this is a difficult allotetraploid genome with really high levels of heterozygosity (11.67%) and a considerable proportion of repetitive sequences (47%). It would not surprise me if Busco alone is not enough to properly separate the haplotypes.
As a curiosity, I was playing with Busco and the global vs sequence by sequence difference in results. I reproduce it in the latest stable release (4.1.1), at least for my dataset. I think at some point this week I will report it to the Busco team to see what they think. I can keep you posted on that if you are interested.
Regards
Hi,
Sorry for the late reply. I have updated the Minimap2 branch to use less RAM. Also, when I used 48 cores, I was able to run HapSolo in 2 hours and 13 minutes. Were there any issues with BUSCO not completing properly on other contigs?
I have checked all my other datasets and I have at least 25% of the contigs containing at least one complete BUSCO. Perhaps 15% is too low for HapSolo to be able to estimate the correct alignment parameters for purging. This result is very interesting.
Please let me know if you are able to increase the proportion of contigs with complete BUSCO's. Also, could you try to run HapSolo again on your cluster? Feel free to run as many scores possible, taking into account that you have enough RAM allocated for your run. Usually it is 4GB/core.
Thank you,
Edwin
Hi Edwin,
It is interesting what you tell me about the BUSCOs of other datasets I will let that sink. As I said before, summing the results of all the contigs gives even better results than The BUSCO of the whole assembly. I double-checked the logs and folders again and I did not see anything that indicates any problem. So I think this characteristic is indeed biological, or maybe the assembly is too fragmented.
I will run Hapsolo again in my cluster and let you know it goes.
Regards,
Hi Edwin,
I already had the chance to rerun Hapsolo with the current changes. First of all, I want to congratulate you because the memory footprint reduction Is really significant. Memory consumption was 36 Gb most of the time but I think it peaked to 52 Gb before finishing. Anyway, it is very different from the 100's I got in previous runs.
As I was writing this reply I noticed I set up just a few replicates (100 x 24 threads = 2400) and not the 50000 you recommended. me. I will run the longer one and will come back to you when finished. Just to let you know, this one took 3h to complete.
Regards
Hi Edwin,
I rerun with a higher number of iterations (2000 x 24 = 48000) and the process took 7h and 41 minutes to complete. the results though are somewhat disappointing :
n n:500 L50 min N75 N50 N25 E-size max sum name
13713 12358 1436 500 55224 111867 212939 168206 1627529 596e6 masurca.fasta
13452 12097 1430 500 55568 112108 213386 168565 1627529 594.7e6 asms/masurca_1000_0.9440_0.8950to1.1736_0.4149_primary.fasta
261 261 49 1002 4122 8436 13577 11025 46445 1357609 asms/masurca_1000_0.9440_0.8950to1.1736_0.4149_secondary.fasta
Basically, no sequences were purged. The curious thing is that I got better results in the previous run when I used only 100 iterations: HapSolo removed around 15 or 16 Mb as you told me previously. Any idea why?
Anyway, maybe you are right and there are not enough scaffolds to train properly the algorithm. Or maybe is something else, I am starting to think that this assembly is not as redundant or there are not as many haplotypes there as I thought. But that would contradict some solid results I got using genome profiling, so I am still trying to grasp this through.
Hi,
That is very dissapointing to hear. Do you have the scores for the first and second run? They should be in your stdout files from your cluster output. I honestly think there is just not enough data to train from. Perhaps tweaking the BUSCO settings. What settings are you using for the odb9 and for the species? Is there a closer species to use when Augustus does it's HMM training? I think if we are able to have more contigs with BUSCO's in them, it should provide more data to train with. One thing that I noticed is that with each increment in step, the delta scores were almost always at 0. I even saw huge differentiation in PID, Q and QR values but all having the same score. This as due to a very low number of BUSCO's in the contigs/scaffolds. I'll dig more into this.
Also have you finished polishing the assembly? I usually recommend 2x arrow 1x pilon. an extra run of pilon doesn't hurt but the benefits are usually slim to none on the second run of pilon.
FYI. I have had massive improvements in BUSCO identification after polishing. I recommend two runs of Arrow or Racon and then a run or two of Pilon.
Also please make sure you are using the latest version of BUSCO 3 as BUSCO 4 is still not supported. Also instead of -sp arabidopsis have you tried using -sp tomato? For our chardonnay genome we also used arabidopsis. Here are the options I used.
MYLIB="embryophyta_odb9" SPTAG="arabidopsis" OPTIONS="-l ${MYLIBDIR}${MYLIB} -sp ${SPTAG}" then your buscorun ${OPTIONS} This should all be included in the github repo also. It does however look like you used the script I provided for running BUSCO on each contig, but I'm not 100% sure.
I would try seeing if you can maximize your BUSCO results first on your polished assembly (having a polished assembly is very important) and then try running HapSolo. You don't need to run it on each contig just your whole assembly first. That way you save time and resources. Once you find the optimal set of options you should be good to go.
All right, I am going to do some polishing and evaluate if it improves the Busco score. To answer some of your questions:
Thanks for the advice, I will return to you after polishing.
Regards
Hi again, Edwin.
I performed two polishing steps of Polca with the short sequencing data I have. You can see in the paper that Polca and NextPolish are faster and more accurate than Pilon, which has become the reference polisher.
Anyway, here are the results:
Field Complete Single Duplicate Fragmented Missing Total
Raw 2046 1939 107 29 46 2121
Pass1 2049 1941 108 26 46 2121
Pass2 2050 1942 108 25 46 2121
As you can see, the polishing recovered some fragmented genes, which is great. However. I think for our purposes the improvement is rather marginal and it would not change our results much. Or what you think?
Regards
Hi,
It looks like you have most busco's accounted for with 96.5%. So far I haven't run HapSolo on an assembly that could not be improved. So it's very likely, your assembly is one that can't take advantage of HapSolo's method, since there aren't enough contigs that contain BUSCO's for HapSolo to train off of. The only way to know for sure is to run BUSCO again on each individual contig, since there was a small and minor increase in complete BUSCO's, but my guess is, that the improvement is not enough. This could be because of fragmentation. I did think that perhaps it was due to high ploidyness, but there should still exist many contigs that contain BUSCO's. There could be many things we could speculate over, such as timing of genome duplication, conservation of euchromatic regions that contain BUSCO's. Also I see that the number of duplicate BUSCO's is already very low. Perhaps there is not much heterozygosity in the regions that contain BUSCO's. HapSolo is dependent on heterozygosity that occurs in regions that contain BUSCO's. So a genome that contains a low number of duplicate BUSCO's won't benefit much by using HapSolo because it tries to minimize on the number of Duplicate and missing BUSCO's.
In short, I agree with you. I also do not think your genome would improve much by running HapSolo. Have you tried purge_dups? They use a constant cutoffs for coverage and alignment similarity. My apologies that HapSolo couldn't be of more use to you. I should update the readme so that it informs the user that HapSolo works best when there are a high number of duplicates and/or when there are a minimum % of contigs that contain BUSCO's.
Thank you,
Edwin
I will be closing this ticket. Please let me know if your issue is still outstanding.
Thank you again,
Edwin
Hi Edwin,
I apologize for letting the time pass by. The genome presents high heterozygosity (according to genome profiling), but I also suspect it is not that much in the Busco orthologs as you said, hence subtracting power to HapSolo.
We were doing some preliminary annotation this past month and found that gene duplication is indeed quite high. It seems like it is not well captured by Busco, even when running scaffold by scaffold.
I appreciate all your collaboration in this case. I think HapSolo is a great tool, but as it is common in bioinformatics, there is no one-size-fits-all solution and this is just an edge case. I indeed tried purge_dups and, although it got better results, they were still minimal (it removed ~3000 scaffolds, but totaling just ~15 Mb).
Again, thanks for everything!
Hi again @esolares,
After preparing both blat and Busco inputs, I runned Hapsolo for my genome. Here is the command I used:
/home/svillanu/Programs/HapSolo/hapsolo.py -i $genome -p inputs/all_blat.psl -b inputs/busco -t $threads -n 10000 -B 5
The first abnormality arose when the program was running longer than expected (according to the manuscript minutes or hours, I am talking about days here). At first, I thought it was because I increased the number of iterations but then other things arouse as well.
I am used to monitoring new programs to see how they behave. I noticed Hapsolo is quite memory-greedy and even consumed all the memory of my big-memory node going to swap and filling swap as well (378 Gb RAM + 64 Gb Swap). At that point it, this error was thrown to Stderr: "OSError: [Errno 12] Cannot allocate memory".
I thought It would die after that. Surprisingly, it did not. It freed around 5% of RAM and continue to run okay (using 95 or 96 % of RAM) almost until the end. I think the memory overflow occurred between iteration 1000 and 2000, but not sure, it was at the beginning though.
I say almost because in the final days the process was zombie dead, just a few iterations before completion. The load of the node was literally 0 since then. Not sure when that happened though, it was at some point between the 4th and 6th day. Here are the final lines of the stdout log:
The interruption of the last line is weird, but that is how it is in the log. Not sure why. It does not seem like a python thing.
The last thing I found weird it the "CostFxnDelta: 0.0". If I got it right, Hapsolo makes an optimization process and I interpret this variable as the delta of the optimization function between iterations. It is weird that it does not change at all, although the other values change with the iterations. Just to be clear, CostFxnDelta is 0.0 since the beginning (literally since iteration 0), and apparently it never got any different.
After all this, I thought I made a mistake providing the inputs. But the Blat psl is okay and as I understand the input for Busco is the parent directory where the busco### directories are located, is that correct?
The last question that I have is what are the threads doing? Because by the log it seemed like every thread is doing N iterations doing in my case around 240000 iterations, which sound like overkill. How is the sensible way of using them?
Worth to mention that by the logs, both the Busco and Blat inputs seem to be okay.
I really appreciate your thoughts on all this mess.