Open sherlyn99 opened 1 year ago
Hi,
Sorry for the late reply. I don't think in principle there should be any limits to passing as many inputs as you want to --input, though python's argparse may have system-wide limitations, as does whatever OS you're using (e.g. specific linux distro).
Parsing in a list of files in a txt seems fine. I have .tar archive input on a future features list for CheckM2, but some sections of the code need to be re-written to avoid tarbombs.
Please let me know if you encounter issues with the workflow, as I've never run CheckM2 on that many genomes, would be good to know if it can handle it.
Thank you such for getting back to me! I am writing to provide an update:
I have been running a job array of ~500 jobs, each containing 2750 genomes. However, I am frequently encountering the issue of out-of-memory error. I am currently supplying each job with 200g memory and 48 hours and some jobs fail with out of memory error (exit with status 125). Do you have any suggestions of how much memory I should request for each job so that the job array can be run smoothly? Thank you!
Hi I have ~1 million of isolate genomes and I wanted to run checkm2 to assess their completeness and contamination. I came across #67 and I was wondering what is a good way to parse a large number of genomes into
checkm2 predict
?I am currently doing
checkm2 predict \ -t 30 \ -i $(cat) \
-o \
--database_path \
--remove_intermediates
1) Is there a limit to the number of files in filelist.txt? 2) Is there a better way to do this other than parsing a list of file paths?
Thank you so much!