Closed jmartin77777 closed 1 year ago
Hi @jmartin77777 - indeed this is almost definitely a problem with prodigal. My hunch is that the gene input file being provided to inStrain is incorrect; prodigal makes a bunch of different genes files and the .fna
one is the one you'll need to provide to inStrain.
Could you send me a head
of the genes files you're providing to inStrain?
Best, Matt
I think I've found my problem, I pulled the gene fasta directly from the UHGG ftp site and concatenated them to build my 'UHGG_reps.genes.fna' gene file. But their annotations were not built by prodigal, or maybe they stripped the Prodigal annotation from the headers when they did functional annotation. I was comparing the UHGG v1.0 assets that you built to what I had built, and your headers include the prodigal info, eg.:
GUT_GENOME000001_1_1 # 1 # 81 # 1 # ID=1_1;partial=10;start_type=Edge;rbs_motif=None;rbs_spacer=None;gc_cont=0.210
The genes fasta I built for v2.0.1 had headers like this:
MGYG000002944_00001 Carbamoyl-phosphate synthase small chain
and I noticed in the error out you are doing a split on '#', and at that point everything clicked. I am running prodigal now and will re-run inStrain profile on my mapped data again using a genes fasta built as your tutorial suggests. I'm pretty sure this was my issue, entirely my fault. Sorry for the trouble
Great! Thank you for the update. Please let me know if this doesn't fix the problem.
Best, Matt
I'm running inStrain profile on a set of samples that have been mapped to the Unified Human Gut Genomes (UHGG) v2.0.1, and all the samples are failing in 'Step 2. Profile scaffolds'. Immediately prior to this execution I had run inStrain profile on the same samples mapped against UHGG v1.0 successfully, so its likely the problem is with something I'm doing but I can't figure it out. For UHGG v1.0 you had prepared all the needed assets and made them available on your website, so for that version I was using your files. For the problematic jobs I'm using assets I created following the instructions from the user manual. Its possible that the error lies in one of those files. The error output I'm seeing is this:
I've been over the UHGG v2.0.1 files I made and they seem OK, and I noticed in the error output it mentions the Prodigal program. My first thought was that its somehow getting the wrong gene annotations, but then I realized that I provided no gene annotation. Prodigal is making the gene calls, so if its an issue with the tool not accepting the output of Prodigal that may actually be something internal. Does anyone have any advice on what this error may be?
I'm using an old version of inStrain (v1.5.3) to make the outputs comparable with some older work I've done, but if necessary I could try re-doing everything using the latest version. I'm just hoping maybe the error message might suggest something obvious before I start redoing everything