dr-joe-wirth / phantasm

PHANTASM: PHylogenomic ANalyses for the TAxonomy and Systematics of Microbes
MIT License
23 stars 0 forks source link

NCBI timeout when using multiple markers #12

Closed astrodea closed 1 year ago

astrodea commented 1 year ago

Hi, I have been trying to use the tool to construct a tree around 2 thermophilic bacteria. I really like your tool's output (as opposed to the enormous trees that GTDB-Tk produces).

However, I am having trouble with the last step of the process using unknown reference genomes and phylogenetic markers. I have been trying to use all phylogenetic markers that the first step produces with a score over 0.9, which for my 2 genomes is around 60. When running the last step this way I receive an error that the connection to NCBI has timed out and the process has been terminated. I successfully completed the step when using fewer than 10 markers, however, these results do not appear comprehensive. Can you please advise if there is a way around the NCBI block?

Thanks in advance.

dr-joe-wirth commented 1 year ago

Hmm... this is tricky. The current workflow will send a single blastp query to NCBI to search against the nr database. I would suspect one of the genes you picked is exceeding NCBI's allowed computation limit. Can you share the exact error message you received and the log file?

astrodea commented 1 year ago

phantasm.log I have attached the log for the last 3 runs I tried. As far as I can see, the workflow manages to run all of the markers through the nr database and NCBI connection fails at the step immediately after that?

dr-joe-wirth commented 1 year ago

Thanks for bringing this to my attention. I am able to recreate the bug on my end, so I can see what is happening. This problem is specific to using a lot of phylogenetic markers. I intend to fix this issue with my next release. I plan to have a new release sometime around September 1.

dr-joe-wirth commented 1 year ago

Modify PHANTASM.findMissingNeighbors.__linkAssembliesWithBlastpResults to limit the search string to 10,000 keywords per request.

dr-joe-wirth commented 1 year ago

@astrodea can you please share the file phylogeneticMarker.blastp with me so that I can easily debug the problem?

astrodea commented 1 year ago

phylogeneticMarker_blastp.txt Hi, I am attaching the file now, I had to modify the extension of phylogeneticMarker.blastp file to be able to send it here but it should be readable.

dr-joe-wirth commented 1 year ago

thanks so much this is very helpful.

dr-joe-wirth commented 1 year ago

@astrodea this bug has been fixed in the most recent release