TrinityCTAT / CTAT-VirusIntegrationFinder

BSD 3-Clause "New" or "Revised" License
13 stars 5 forks source link

Handling for the scenario where `blastn -query` returns no results #45

Open allaway opened 2 years ago

allaway commented 2 years ago

Hi there,

Yesterday I ran into a situation where prep_genome_lib/ctat-vif-lib-integration.py errored on line 79: df = pd.read_csv(blastn_outfile, sep="\t", header=None, usecols=[1,2,8,9]) when the previous blastn step outputs a file with no contents.

This checks for the number of lines in the blastn output file and bypasses the processing instead creating a dummy file just with column headers for use downstream.

I am a bit of a python novice so please let me know I've done something silly!

brianjohnhaas commented 2 years ago

oh, interesting. Thanks for pointing this out. Leave the PR in, but I might handle it differently when I put out a future release.

On Fri, Aug 19, 2022 at 9:57 AM Robert Allaway @.***> wrote:

Hi there,

Yesterday I ran into a situation where prep_genome_lib/ctat-vif-lib-integration.py errored on line 79: df = pd.read_csv(blastn_outfile, sep="\t", header=None, usecols=[1,2,8,9]) when the previous blastn step outputs a file with no contents.

This checks for the number of lines in the blastn output file and bypasses the processing instead creating a dummy file just with column headers for use downstream.

I am a bit of a python novice so please let me know I've done something silly!

You can view, comment on, or merge this pull request online at:

https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45 Commit Summary

File Changes

(1 file https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45/files )

Patch Links:

- https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45.patch

https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45.diff

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4A5UPKYTZEKYZZEKTVZ6HDRANCNFSM57A2VINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

brianjohnhaas commented 2 years ago

Also, just in case, please be aware that we provide a virus database collection to use with ctat-vif, but go ahead and use it on your own set too if that's of interest.

On Fri, Aug 19, 2022 at 10:32 AM Brian Haas @.***> wrote:

oh, interesting. Thanks for pointing this out. Leave the PR in, but I might handle it differently when I put out a future release.

On Fri, Aug 19, 2022 at 9:57 AM Robert Allaway @.***> wrote:

Hi there,

Yesterday I ran into a situation where prep_genome_lib/ctat-vif-lib-integration.py errored on line 79: df = pd.read_csv(blastn_outfile, sep="\t", header=None, usecols=[1,2,8,9]) when the previous blastn step outputs a file with no contents.

This checks for the number of lines in the blastn output file and bypasses the processing instead creating a dummy file just with column headers for use downstream.

I am a bit of a python novice so please let me know I've done something silly!

You can view, comment on, or merge this pull request online at:

https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45 Commit Summary

File Changes

(1 file https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45/files )

Patch Links:

- https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45.patch

https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45.diff

— Reply to this email directly, view it on GitHub https://github.com/broadinstitute/CTAT-VirusIntegrationFinder/pull/45, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZRKX4A5UPKYTZEKYZZEKTVZ6HDRANCNFSM57A2VINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

allaway commented 2 years ago

oh, interesting. Thanks for pointing this out. Leave the PR in, but I might handle it differently when I put out a future release. Sounds good, I figure there is probably a better way to handle this. :)

Also, just in case, please be aware that we provide a virus database collection to use with ctat-vif, but go ahead and use it on your own set too if that's of interest.

Thanks! I did see that.

I am revisiting a 2021 summer intern's project- IIRC (and I certainly might be wrong about this) - the collection of viruses provided was only HPV at that time or maybe HPV plus some other viruses?

I believe she retrieved all of the available human viral sequences from NCBI and used that - so I am trying to use that for consistencies sake, but I'll check out the virus fasta you have provided as well.