Closed yue-clare-lou closed 3 years ago
Hi Clare,
I'm actually not sure why it would have broken here. This step comes after virus identification and information for all viruses are used to build a genbank file. If scaffold_X was given a quality then it should have processed correctly. Did this only happen to one of the input files? Since virus identification did finish and it broke at post-processing you can use the information in the .faa
/.ffn
files, or the list of virus names .txt
file, to grab the virus sequences. That of course isn't the best solution if the error occurred for many input files.
Here are some questions that may help me to figure it out:
Which version of VIBRANT are you running?
Are there any special characters in the sequence names, such as a pipe symbol ( |
).
Are there multiple sequences called scaffold_X
in the same file?
Did any files with the same name, within the same folder, run at the same time?
Kris
Hey Kris,
Which version of VIBRANT are you running?
I am currently using version 1.2.1.
Are there any special characters in the sequence names, such as a pipe symbol ( | ).
No. This is an example seq name of the scaffold that triggered the error message:
>uvig_367585 SRR1761699_976 length_23967_VirSorter_cat_2
Are there multiple sequences called scaffold_X in the same file?
No
Did any files with the same name, within the same folder, run at the same time?
No
The database I am running VIBRANT on is the Gut Phage Database (https://doi.org/10.1016/j.cell.2021.01.029). I have run VIBRANT twice on this dataset and every time, it paused for the same error except that the scaffold that caused the pause was different each time (uvig_367585, uvig_456365). In both runs, these two scaffolds were rated by VIBRANT.
I want to extract all provirus sequences so I'd like to use .phages_lysogenic.fna file specifically. both .phages_lysogenic.faa and *.phages_lysogenic.ffn only output gene files so they are not ideal in my case.
I'll download the database and look into it.
Thanks a lot!
This is how I run VIBRANT: VIBRANT_run.py -i GPD_sequences.fa -folder VIBRANT -t 48
I am currently running VIBRANT on GPD database using the version v1.0.1. I wonder whether the error that I ran into has something to do with a specific version.
v1.0.1 of VIBRANT? If so then that is the likely error. The initial releases (v1.0.0 and v1.0.1) had a few bugs. To my knowledge v1.2.1 is fully stable.
I ran VIBRANT twice using v1.2.1 on the Gut Phage Database and I received the same type of error that I pointed out earlier.
I therefore just switched to v1.0.1 of VIBRANT to see if I will run into the same error. This is currently running.
In all honesty v1.0.1 has a couple major issues and likely is not worth running. But if you're curious you can leave it running to see if you get the same error.
i see, thanks for letting me know. I am just curious so I will let it run. I will let you know whether I run into the same error or not.
Hey fyi - when using v1.0.1, I also ran into the same issue but it was triggered by a different scaffold ('uvig_338582'). I wonder if it is because the sequence names from the GPD don't get recognized by VIBRANT when it is trying to build the genbank file?
update -- I think I know why. The sequence names from the GPD contain both\t
and space and it is the \t
that is causing the crash of VIBRANT.
Looks like it could be tabs? VIBRANT separates some information by tabs and assumes there will not be tabs in the definition lines of sequences. It can handle spaces and most things, but tabs may be an issue (I could be wrong here). Using grep
it looks like every sequence has a tab. Try replacing tabs in the file and running again. An option to do this is with sed: cat GPD_sequences.fa | sed 's/\t/~/g' > GPD_sequences.no-tabs.fa
. Replace ~
with whatever you want to replace the tabs with. If this solves the issue then I'll update the README.
Hey yah it is a tab issue. I replaced tab with space and no more errors. Thanks!
Hello,
I ran VIBRANT on ~142k fasta files that I know for sure that has a sufficient number of phage scaffolds. When the run was finished, I noticed a couple issues:
I wonder if these two issues have something to do with the error appeared in the VIBRANT_log_run file:
This scaffold_X was given a medium quality draft by VIBRANT and was assigned to be lysogenic by VIBRANT. This scaffold_X was scored to be high quality by checkV.
Thanks, Clare