fwhelan / coinfinder

A tool for the identification of coincident (associating and dissociating) genes in pangenomes.
GNU General Public License v3.0
92 stars 9 forks source link

Issue with phytools and question regarding isolate number #42

Closed reymonera closed 3 years ago

reymonera commented 3 years ago

Hi,

First of all, thanks for coinfinder. I found the research done in the paper to be interesting and would like to apply some ideas with a pangenome I'm working with. However, I have some questions:

1) I'm facing an issue with the phytools package. The Error appears like this:

coinfinder -i gene_presence_absence.csv -I -p RAxML_bestTree.newick -o coinfinder_prueba -F 0 -n --associate
Reading arguments...
> CORRECTION ······· = NONE
> METHOD ··········· = COINCIDENCE
> ALT_HYPOTHESIS ··· = GREATER
> MAX_MODE ········· = ACCOMPANY
> SET_MODE ········· = FULL
> PERMIT_FILTER ···· = NO
> VERBOSE ·········· = NO
> OUTPUT_ALL ······· = NO
> FRACTION CUTOFF ·· = N/A
> SIGNIFICANCE_LEVEL = 0.05
> COMBINED_FILE ···· = gene_presence_absence.csv
> GENE_NAME ······· = Genes
> GENOME_NAME ········ = Genomes
Formating Roary output for input into coinfinder...
Reading gene-genome edges...
- n.GENES = 1684
- n.GENOMES  = 18
- n.EDGES = 22022
Dropping saturated sets...
**Warning**: Saturated data has been dropped!
- d.GENES = -980
- n.GENES = 704
- n.GENOMES  = 18
- n.EDGES = 22022
Dropping rare elements in collection...
Nothing dropped due to rare elements in collection, your data is good to go. :)
Dropping empty sets...
Nothing dropped, your data is good to go.
Iterating matrix...
No significance correction, the significance level is 0.05.
Running analyses...
Calculating lineage dependence.
ERROR MESSAGE FROM R: 
Error: package or namespace load failed for ‘phytools’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/marlen/anaconda3/lib/R/library/phangorn/libs/phangorn.so':
  libopenblas.so.0: error while loading shared libraries: cannot open shared object file: No such file or directory
Interrupted

I have already checked if phangorn is installed and I could find it in my conda environment. However, I'm still getting the same issue. Maybe there's something I need to modify?

  1. My second question is about the number of isolates needed for the output. I'm currently working with 2 datasets: One with 18 isolates and one with 72. It should be noted that the bacterial genomes I'm working with are short: Most of them have 2 or 3 Mb. In both cases, the standard command didn't found any associations. That's why I decided to drop out the filtering with the -F flag. The paper regarding coinfinder worked with 209 genomes. So my question would be, what are the recommendations for working with small sample sizes? Is it recommended?

Thanks for any answer you could give me!

fwhelan commented 3 years ago

Hi reymonera! Thanks for your kind words about coinfinder and sorry that you're running into this issue.

I can answer your second question first; although there isn't a hard-and-fast rule about how many genomes you need, we found in the original Coinfinder paper that somewhere between 50 and 100 is a good minimum (https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000338, Table 3). I would imagine that your n=72 dataset will be okay, but I imagine 18 won't be enough isolates for coinfinder to say anything significant about your data, unfortunately.

fwhelan commented 3 years ago

As for your first question, I'm not too sure. It seems like phytools isn't loading properly because it can't find phangorn.so. If you try library(phytools) in the R terminal in your conda environment (but separately from coinfinder), do you get a similar error?

It's not the same, but the comments in this issue might be helpful: https://github.com/igraph/rigraph/issues/275

fwhelan commented 3 years ago

Closed due to inactivity.