AnantharamanLab / VIBRANT

Virus Identification By iteRative ANnoTation
GNU General Public License v3.0
142 stars 37 forks source link

VIBRANT_AMG_individuals.tsv not found #7

Closed AlanWongHonLun closed 4 years ago

AlanWongHonLun commented 4 years ago

Hi Kris,

I was using VIBRANT and got an error:

`Traceback` (most recent call last):
  File "/srv/scratch/z3336178/anaconda/envs/vibrant/bin/VIBRANT_run.py", line 623, in <module>
    with open('VIBRANT_AMG_individuals_' + str(base) + '.tsv', 'r') as annotations:
FileNotFoundError: [Errno 2] No such file or directory: 'VIBRANT_AMG_individuals_13_contigs_1000.tsv`

I installed VIBRANT through anaconda and downloaded the databases through download-db.sh I used prodigal beforehand to translate the nucleotide fasta to amino acid fasta files and here is my command line: source activate vibrant VIBRANT_run.py -i 13_contigs_1000.faa -t 16 -f prot

The log file is empty so I am not sure why no AMGs were identified.

Thanks for your time and let me know if you need more info!

Cheers

Alan

KrisKieft commented 4 years ago

Hi Alan,

I tested the conda download and everything appears to be working. Running prodigal and then the given command should also work. The log file should always display a specific template and never be empty if VIBRANT completed, but due to the error it was never written to.

There should be two different folders containing HMM outputs, can you check if those are empty or contain annotation information? It's possible that the databases were never set up properly or the input proteins were unable to be read. After running the "download-db.sh" script you should have seen a prompt stating "VIBRANT v1.0.1 is good to go!". If either of those is the case then those HMM files will be empty. Can you also send an example of a definition line from your protein FASTA file? Thanks.

Kris

Handymanalan commented 4 years ago

Hi Kris,

Thanks for the reply. Both HMM output folders are not empty, they contain annotation information. Do you mean you need the first few lines of the protein faa file? Here it is:

`

k141_4306192_1 # 2 # 94 # -1 # ID=3_1;partial=10;start_type=ATG;rbs_motif=TAAAA A;rbs_spacer=11bp;gc_cont=0.419 MRTHYLSWLIMSAFVMMLVSCGSGDKKDGDS k141_4306192_2 # 285 # 797 # 1 # ID=3_2;partial=00;start_type=ATG;rbs_motif=AAC AA;rbs_spacer=15bp;gc_cont=0.466 MNQFSTTMKHLLLVITALCISLFSYGQSATPELISTAGDHYESGNIQVSWSVGELMIDTYTGTNNILTQGFHQSDYQIIV EREQPGIDWTIEAYPNPTTDQITVSVSDFDKLGNSQISLTDLSGKTLMIKELTGSKTNLDVSHFAAGTYFLTVLDANNRW LKSFKMIKAE* ` I tried it on some viral genomes (nucleotide fasta files) before and it worked. It is kind of weird that it didn't work this time with the assembled contigs metagenomics file.

Thanks!

Alan

KrisKieft commented 4 years ago

My next guess would be that something on the system end malfunctioned, such as a thread dying and that AMG file was never created. I've also seen runs fail if a computer goes to sleep during a run or a broken pipeline on a server (e.g., not logging out of the server during a run). Did you try deleting any temp/analysis files that may remain and re-running with the same command? Let me know if that works, if not I'll keep working on it.

Kris

Handymanalan commented 4 years ago

Hi Kris,

I ran it again and it came up with the same error. I deleted any temp/analysis files. I will install VIBRANT through github instead of Anaconda and see if it works.

Alan

Handymanalan commented 4 years ago

Hi Kris,

VIBRANT installed through Github worked. Thanks!

Does it work on both assembled metagenome contigs or archaeal/bacterial MAGs?

Cheers

Alan

KrisKieft commented 4 years ago

Hi Alan,

Great to hear! I'll keep an eye out for issues that arise from Anaconda downloads. Yes, it will work on contigs, MAGs and whole bacterial/archaeal genomes. However, there is a small stipulation for MAGs. Usually this isn't an issue, but there is no way to indicate to VIBRANT which contigs belong together in a single bin (MAG). Therefore each contig of a MAG is queried separately. I say this usually isn't an issue because an entire MAG doesn't need to be considered as a virus, just the individual pieces. The issue would be if you have a virus that you know is in multiple contigs. Hope that helps.

Kris

Handymanalan commented 4 years ago

Hi Kris,

The same problem occurs again with the Github downloaded version with the AMG.tsv is not found error.

Cheers

Alan

KrisKieft commented 4 years ago

Hi Alan,

Unfortunately I'm not really sure what is going on. You might not be the only person to run into this issue, but you're the only one to bring it to my attention. Are you running VIBRANT on a personal computer, lab server, or shared computing cluster? Or a different platform? How often does this error occur (every other run, basically every run, infrequently, etc.)?

Kris

Handymanalan commented 4 years ago

Hi Kris,

I was running VIBRANT on a computing cluster. I ran it several times and it gave me the same error everytime. I ran VIBRANT on assembled contigs and this error occurs. But when I tried running it on phage genomes generated by MARVEL, then VIBRANT worked.

Cheers

Alan

KrisKieft commented 4 years ago

Interesting. This at least narrows it down a little, especially since the error occurs at the same point every time. To my knowledge VIBRANT doesn't have any issues with input file format, but maybe I missed something and MARVEL is altering the format such that VIBRANT will work. When you say "phage genomes generated by MARVEL", are you having MARVEL assemble genomes using metaSpades or are you just taking the FASTA files from results/phage_bins? Are you able to send me an example sequence or a few sequences that you have been inputting to VIBRANT? If you are ok with doing that then you can send a file to my email: kieft@wisc.edu. I understand this is a lot of hoops to jump through to get VIBRANT running so thank you for being patient!

Kris

Handymanalan commented 4 years ago

Thanks Kris,

Yes, it's the FASTA files from results/phage_bins. I will send you both fasta files that worked and not worked.

Cheers

Alan

KrisKieft commented 4 years ago

Hi Alan,

I believe I resolved the issue. VIBRANT has been updated to v1.1.0 and it should work for you now. GitHub is updated but Anaconda is not, though it should be soon. I'm going to close this issue but definitely let me know if you're still running into issues.

Strangely enough, I was able to replicate that an issue occurred, but I was not able to reproduce the same error. I'm going to assume it was the same issue but I never saw VIBRANT_AMG_invididuals.tsv not found. From what I could figure out, the error was happening because there were zero viruses in your contigs.fa.txt file and it was exiting at a point that caused an error. I smoothed out VIBRANT's exit and I think that fixed it. VIBRANT shuffles the input scaffolds randomly so the error occurred only when the scaffolds were in a particular order. There were few enough of them that the error happened often enough. The MARVEL.fasta file was behaving differently because it had viruses and was not exiting at that step. Again, let me know if you are still getting an error but I think I fixed it.

Kris

morgvevans commented 4 years ago

Hi, I am really interested in using this tool but having issues downloading with both methods. I'm using a supercomputing center & the command line has python2.7 as the default, and I cannot sudo so I can't update to python3.5. So the GitHub version won't work. I have tried several ways of installing via conda, including passing python=3.5 as part of the install 'conda install -c bioconda vibrant==1.2.0 python=3.5'. What results every time is a ton of package conflicts and 'note that strict channel priority may have removed packages required for satisfiability', and the package doesn't fully install (I can run download-db.sh but I get a ton of errors). I've tried setting channel priority to different settings with no luck. Maybe this is more of a conda error than anything but I typically don't have issues installing other programs. If anyone can provide assistance that would be great. I am running miniconda3.

KrisKieft commented 4 years ago

Hi,

Without seeing the exact errors I'm not entirely sure what is going on, but I would guess it's just python3 code trying to execute using python2. VIBRANT is not at all compatible with python2. Do you know if python3 is installed? If python3 is installed then you should be able to just write python3 in front of the command. E.g., python3 VIBRANT_run.py -i input.fasta. Typically the GitHub download is easier to deal with (just in my opinion). A quick check if you have python3 is to type into the command line python3 --version. Unfortunately if you don't have python3 then VIBRANT cannot be used.

Kris

mTangherlini commented 3 years ago

Hi, I am experiencing the same issue on both versions 1.0.1 and 1.2.1 (freshly upgraded). I'm running VIBRANT on a HPC cluster and it still halts after throwing the following error:

File "/opt/VIBRANT/VIBRANT_run.py", line 650, in <module> with open(str(out_folder)+'VIBRANT_AMG_individuals_' + str(base) + '.tsv', 'r') as annotations: FileNotFoundError: [Errno 2] No such file or directory: 'VIBRANT/VIBRANT_AMG_individuals_phages_to_test.tsv'

I made many tests with both reference viral genomes, prokaryotic MAGs and viral contigs from metagenomes, but I always end up with this error. What's weirder is also that viral genomes with 0 AMGs yield a perfect output, whereas viral genomes with AMGs (as AMGs can be seen in the annotations) yield errors. What should be done to fix this?

KrisKieft commented 3 years ago

Hi,

This was a known error with v1.0.1 and is caused by incorrectly reading AMG annotations. In that regard it makes sense that no error would be produced on genomes without AMGs. However, this should have been resolved in v1.2.1.

Which method did you use for installing VIBRANT (I assume github since it's in /opt)? Can you run VIBRANT_run.py --version and ensure that v1.2.1 is the default version on the cluster?

mTangherlini commented 3 years ago

Hi Kris, Thanks for the quick reply. Yes, I can confirm both that VIBRANT was installed from GitHub and that the version reported is 1.2.1.

Michael

Hi,

This was a known error with v1.0.1 and is caused by incorrectly reading AMG annotations. In that regard it makes sense that no error would be produced on genomes without AMGs. However, this should have been resolved in v1.2.1.

Which method did you use for installing VIBRANT (I assume github since it's in /opt)? Can you run VIBRANT_run.py --version and ensure that v1.2.1 is the default version on the cluster?

KrisKieft commented 3 years ago

Are you specifying a -folder flag? One quick idea to try would be specifying an output location in case you're having permissions issues on the cluster that aren't showing up. Try -folder to a location that you know you have read/write permissions. I suggest testing it out on example_data/Podoviridae_KJ183192.1.fasta. That genome encodes an AMG.

mTangherlini commented 3 years ago

Yes, I have tried all the possible combinations, including utilizing the -folder flag, yet I get the same errors. However, your example phage works smoothly. I guess there's something in my contigs, then. Could it be some kind of unwanted character in scaffold names?

Are you specifying a -folder flag? One quick idea to try would be specifying an output location in case you're having permissions issues on the cluster that aren't showing up. Try -folder to a location that you know you have read/write permissions. I suggest testing it out on example_data/Podoviridae_KJ183192.1.fasta. That genome encodes an AMG.

KrisKieft commented 3 years ago

Can you email me a couple of your test scaffolds? If you are ok with doing that. kieft@wisc.edu

AY-LIANG commented 3 years ago

Hi I meet the same error on VIBRANT v1.2.1,installed by conda. At first I run VIBRANT on the assembled contig,and it seems to generate all the output files,but I get error "Error: no input sequences to analyze."

Then I saw the issue #20 ,so I split my FASTA file into 10 using seqkit.When running VIBRANT on the split files,I always get the AMG file error in the log files like FileNotFoundError: [Errno 2] No such file or directory: 'result/001/VIBRANT_AMG_individuals_gajewski_filter_contigs.part_001.tsv'

I used the --foler flag and ensure that I have permission for the direction What should I do to fix that?

KrisKieft commented 3 years ago

Hi,

Please let me know which version you are running and how large your input files are. If they are small can you email me the file? The issue may be with the sequence names.

AY-LIANG commented 3 years ago

Hi The VIBRANT verison is 1.2.1,installed by conda.My input file is about 8G large,and the split files are about 800M each.I think it is too large a file for mailing.But the headers of my input file is like the follow

NODE_7061_length_5249_cov_9.668656 NODE_6_length_398501_cov_181.261385 NODE_21_length_268229_cov_118.919608 NODE_25_length_254595_cov_115.325921 NODE_27_length_251184_cov_121.472697 NODE_31_length_235327_cov_121.740721 NODE_39_length_216736_cov_115.837941 NODE_64_length_188903_cov_193.614431 NODE_72_length_181588_cov_212.313040 NODE_80_length_174902_cov_69.028248

KrisKieft commented 3 years ago

Ok great, there shouldn't be any issues with your definition lines. Sometimes certain characters can cause issues.

Are you able to track any error outputs generated during the run? VIBRANT does not log errors if they are not specific to the program itself. These can be errors such as running out of memory or incorrect versions. How much memory does your machine have? Can you verify that your dependency versions are correct, such as numpy, pandas, sklearn, hmmer, etc.

AY-LIANG commented 3 years ago

I did not see any error during the run with the split files,only found AMG file error in the log files. When I run the whole file at once,I can see error

"Error: no input sequences to analyze."

but no error reported in the log files.

My machine have 252G memory,and the dependency versions are

numpy 1.19.5 pandas 0.25.3 scikit-learn 0.21.3 hmmer 3.3.2

I tried running VIBRANT on another input file which is about 2G large,and it just worked well.Maybe this is related to file size?

KrisKieft commented 3 years ago

I think there may be a memory spike with sorting the data within the code. This is something I need to look at in more detail. I would suggest running your dataset in batches separately.

AY-LIANG commented 3 years ago

Hi I split my fasta file into 4 parts and ran VIBRANT separately.Two of them got perfect results,but the remaining two still got the AMG file error.

KrisKieft commented 3 years ago

If it's not too much of a hassle can you try installing VIBRANT via GitHub rather than conda? This is as simple as downloading the files from GitHub and it's ready to run (details in the README). If you know where the current databases are installed from your conda install then you can actually use the same ones you have and specify the folder with the -d flag.

AY-LIANG commented 3 years ago

Hi I installed VIBRANT via GitHub and it worked well.Thanks for your help!