flass / pantagruel

a pipeline for reconciliation of phylogenetic histories within a bacterial pangenome
GNU General Public License v3.0
46 stars 7 forks source link

Step 08 clade_specific_genes: ERROR: step 5.1: failed #42

Closed MartinezRuiz-Carlos closed 3 years ago

MartinezRuiz-Carlos commented 4 years ago

Hello Florent, sorry I think I might have stumbled into another issue in step 08. This time the issue seems to be the GO term analysis, here's the stdout I get:


will try and resume computation of task where it was last stopped
will run tasks: 8
[2020-09-06 09:57:26] Pantagruel pipeline task 8: classify genes into orthologous groups (OGs) and search clade-specific OGs.
Task folder '/home/carlos/Desktop/genomes_archea/panta_out/db_sc3/08.orthologs' already exists; -R|--resume option was used so Pantagruel will atempt to resume from an interupted previous run
generating ortholog collection from reconciled gene trees
 call: python2.7 /pantagruel/scripts/get_orthologues_from_ALE_recs.py -i /home/carlos/Desktop/genomes_archea/panta_out/db_sc3/07.reconciliations/fullgenetree_ALE_recs/nocollapse/noreplace/ale_fullgenetree_dated_1 -o /home/carlos/Desktop/genomes_archea/panta_out/db_sc3/08.orthologs/ortholog_collection_1 --threads=4  --ale.model=dated --methods=mixed --max.frac.extra.spe=0.5 --majrule.combine=0.5 --colour.combined.tree --use.unreconciled.gene.trees= --unreconciled.format=nexus --unreconciled.ext=.con.tre &> /home/carlos/Desktop/genomes_archea/panta_out/db_sc3/logs/get_orthologues_from_ALE_recs_ortholog_collection_1.log
step 1: complete generating ortholog collection from reconciled gene trees

importing ortholog classification into database
first delete previous records for this ortholog collection ('ortholog_collection_1') in the database '/home/carlos/Desktop/genomes_archea/panta_out/db_sc3/03.database/db_sc3'
step 2.0: completed importing ortholog collection record into database
step 2.1: completed importing ortholog classification into database for reconciled gene trees
step 2.2: completed importing ortholog classification into database for unreconciled gene trees`

`generating abs/pres matrix
ortholog_collection_1
building matrix of gene presence / absence for 9 genomes
examining a total of 12545 CDSs with non-ORFan family assignment
retrieveing orthology classification from collection: ortholog_col_id=1
1495 families not covererd by orthology classification (means no evolution scenario was inferred for these families)
0 families covererd by orthology classification into a total of 0 orthologous groups
these totalize 5 families with unique representative in the dataset (singletons) and 1490 others [total: 1495]
step 3: completed generating abs/pres matrix

listing clade-specific orthologs
step 4: completed listing clade-specific orthologs

null device 
          1 
Found 52422 functional annotation records linked to GO terms in the database '/home/carlos/Desktop/genomes_archea/panta_out/db_sc3/03.database/db_sc3'
Will now run GO term enrichment tests
step 5.1: 
Error: cannot join using column cds_code - column not present in both tables
Error: cannot join using column cds_code - column not present in both tables
clade0  (repr.: 'CUNDIV1'; size: 3) 'CUNDIV1','CUNDIV2','GPL37'
Error: cannot join using column cds_code - column not present in both tables
ERROR: step 5.1: failed  for clade clade0 including NULL go_id
ERROR: step 5.1: failed 
ERROR: Pantagruel pipeline task 8: failed.

I see a log file was created during the step that fails (get_orthologues_from_ALE_recs_ortholog_collection_1.log), but it is empty. The step does create several outputs in panta_out/db_sc3/08.orthologs/ortholog_collection_1/,including the .tab files containing the CDS info, but as far as I can see there is a column called cds_code. Again, apologies if this is something very obvious I am missing, but I have been going in circles here for a while now. Thanks!

flass commented 3 years ago

Hi Carlos, thank you for reporting this. I was actually working on fixing this issue as reported by anther user. Your complete logs are really helpful as I had trouble to see where was located to problem; it seems it is a column name that wrongly named in the SQLite db. I already made an attempt to fix this in 00aaac7 but it wasn't enough apparently. I'll come back to you shortly on that one.

flass commented 3 years ago

I verified, the issue you report should be fixed in the more recent versions. I just pushed c48ef5b (on brach master; e064da1 on branch usingGeneRax) where the logging should be more complete in case you still hit an error. please try again with the updated script and let me know how it goes. Cheers, Florent

MartinezRuiz-Carlos commented 3 years ago

Thank you so much for addressing this so quickly, I will give it a go. Will I need to re-run previous steps (e.g. step 3)?

flass commented 3 years ago

no problem at all! no you should be able to resume from task 08; adding the resume flag -R to the command will also save you some time there.

MartinezRuiz-Carlos commented 3 years ago

Hello Florent, sorry for the delay in running this, unfortunately I get a (new) error. I downloaded the newest version from the master branch and re-installed the Docker image, when I run step 08 again I get this:

This is Pantagruel pipeline version 9088a88cf70b72cc5ed7a570ba6c23696ee1ffec using source code from repository '/pantagruel'
will try and resume computation of task where it was last stopped
 will run tasks: 8
[2020-09-21 21:10:23] Pantagruel pipeline task 8: classify genes into orthologous groups (OGs) and search clade-specific OGs.
Task folder '/home/carlos/Desktop/genomes_archea/panta_out/db_sc3/08.orthologs' already exists; -R|--resume option was used so Pantagruel will atempt to resume from an interupted previous run
generating ortholog collection from reconciled gene trees
# call: python2.7 /pantagruel/scripts/get_orthologues_from_ALE_recs.py -i /home/carlos/Desktop/genomes_archea/panta_out/db_sc3/07.reconciliations/fullgenetree_ALE_recs/nocollapse/noreplace/ale_fullgenetree_dated_1 -o /home/carlos/Desktop/genomes_archea/panta_out/db_sc3/08.orthologs/ortholog_collection_1 --threads=4  --ale.model=dated --methods=mixed --max.frac.extra.spe=0.5 --majrule.combine=0.5 --colour.combined.tree --use.unreconciled.gene.trees= --unreconciled.format=nexus --unreconciled.ext=.con.tre &> /home/carlos/Desktop/genomes_archea/panta_out/db_sc3/logs/get_orthologues_from_ALE_recs_ortholog_collection_1.log
step 1: complete generating ortholog collection from reconciled gene trees

importing ortholog classification into database
first delete previous records for this ortholog collection ('ortholog_collection_1') in the database '/home/carlos/Desktop/genomes_archea/panta_out/db_sc3/03.database/db_sc3'
step 2.0: completed importing ortholog collection record into database
step 2.1: completed importing ortholog classification into database for reconciled gene trees
step 2.2: completed importing ortholog classification into database for unreconciled gene trees

step 3: generating abs/pres matrix
ortholog_collection_1
building matrix of gene presence / absence for 9 genomes
examining a total of 12545 CDSs with non-ORFan family assignment
retrieveing orthology classification from collection: ortholog_col_id=1
1495 families not covererd by orthology classification (means no evolution scenario was inferred for these families)
0 families covererd by orthology classification into a total of 0 orthologous groups
these totalize 5 families with unique representative in the dataset (singletons) and 1490 others [total: 1495]
step 3: completed generating abs/pres matrix

listing clade-specific orthologs
ERROR: step 4: failed listing clade-specific orthologs; check specific logs in '/home/carlos/Desktop/genomes_archea/panta_out/db_sc3/logs/get_clade_specific_genes.log' for more details
ERROR: Pantagruel pipeline task 8: failed.

The log file simple contains: /usr/bin/env: 'Rscript --vanilla': No such file or directory To me this seems like an issue with the Docker image config?

flass commented 3 years ago

Hi Carlos,

indeed there was an issue in the use of the shebang #!/usr/bin/env Rscript --vanilla from the Docker image; apparently it's not fine to provide a command with options. I changed all the R script shebangs to be #!/usr/bin/env Rscript, which should fix your issue from the Docker image. This is implemented in 5bd6428 (master) and 8b582a0 (usingGeneRax), both should soon be built into a docker image on the Dockerhub (even though I have the impression you make your own builds). Please let me know if that's good, and re-open the issue if not.

Cheers, Florent