fwhelan / coinfinder

A tool for the identification of coincident (associating and dissociating) genes in pangenomes.
GNU General Public License v3.0
92 stars 9 forks source link

Error in if (any(nc)) x[nc] <- substr(x[nc], 1, len): missing value where TRUE/FALSE needed #76

Closed GiacomoBroglia closed 4 months ago

GiacomoBroglia commented 5 months ago

Dear coin finder devs, We are running coin finder with 1580 genomes and 100 genes (KOs annotated). Follows the command line that we used

coinfinder -i $SCRATCH/Coinfinder_analysis/KO_list_COIN.tsv \
                -p $SCRATCH/Coinfinder_analysis/Terr_nozero_Tree.nwk \
                --associate \
                --dissociate \
                --bonferroni \
                --num_cores 48 \
                -v \
                -R \
                -o Output_COIN_New2 

Everything works fine until the R analysis which seems to stop at the step "Calculating lineage dependence..." with the following error:

Calculating lineage dependence...
Error in if (any(nc)) x[nc] <- substr(x[nc], 1, len) : 
  missing value where TRUE/FALSE needed
Calls: makeLabel -> makeLabel.phylo -> makeLabel.character
Execution halted 

Attached are the log file of the run, and the input data to replicate the error. Thank you in advance for the availability.

KO_list_COIN_TEST.txt Terr_nozero_Tree.txt G_CoinTest2_TerriErr.txt

fwhelan commented 4 months ago

Hello,

I haven't quite been able to figure this out yet, but I believe that this is caused by the makeLabels(tree) line of code in lineage.R. I wonder if the special characters in some of your node labels might be an issue? Or the fact that some nodes have labels and others don't.

I'll keep trying...

GiacomoBroglia commented 4 months ago

Hello Fiona,

Thanks for your response. I just figured out what the problem was. The tree that I used was directly generated through GTDB-tk which actually add internal nodes IDs and this creates problem for the makeLabel.character() function. By creating the tree directly from the .msa file no internal nodes IDs were generated and this solved the problem for my analysis. Still thank you for your response.