SionBayliss / PIRATE

A toolbox for pangenome analysis and threshold evaluation.
GNU General Public License v3.0
91 stars 29 forks source link

pangenome_variants_to_treeWAS #91

Open ingridvanw opened 1 month ago

ingridvanw commented 1 month ago

Hi guys,

A number of additional scripts are provided (in the scripts and tools directories) for converting or analysing the PIRATE outputs. I am using the ones to convert and run treeWas (pangenome_variants_to_treeWAS.pl)

When running: perl pangenome_variants_to_treeWAS.pl -i /results/ -o /output/ -m /metadata/phenotype_for_pirate.csv -tree /output/tree.parsimony.tre --treeWAS /scripts/

The PIRATE files are succesfully converted (accessory_alleles.treewas_input, accessory.treewas_input and core_alleles.treewas_input are created. However, the results after running treewas are empty. It created three folders (for example accessory_alleles) with one log file, showing:

running '/usr/lib/R/bin/R --no-echo --no-restore --file=/scripts/run_treeWAS.R --args /accessory_alleles.treeWAS_input /metadata/phenotype_for_pirate.csv /output/tree.parsimony.tre /accessory_alleles'

But the created .tab files in results are empty, like this:

SNP_locus allele_name family_name gene_name product pvalue score G1P1 G0P0 G1P0 G0P1 test variant_type

Someone any idea? Thanks in advance!

SionBayliss commented 1 month ago

Hi,

That is very odd. So you are saying the input files contain your genes but that they are removed after running treewas? Do you genes names contain any odd or protected characters? Are you sure it isn't something with odd carriage return characters being incompatible. The perl scripts were developed for Linux so running dos2unix or unix2dos might resolve any issues if that is the case.

S

ingridvanw commented 1 month ago

I don't see any odd or protected characters, no spaces. accessory_alleles.treeWAS_input looks like:

Samples g00827_00002 g00827_00005 g00827_00006 g00827_00014 g00827_00022 SAB003 1 1 0 1 0 SAB005 1 1 0 1 0 SAB006 1 1 0 1 0 SAB007 1 1 0 1 0 SAB008 1 1 0 1 0 SAB009 1 1 0 1 0

Seems that the scripts sees no variants in accessory_alleles.treeWAS_input?

Best regards, ingrid

SionBayliss commented 1 month ago

Hi Ingrid,

Have you successfully used similar, example or dummy data to treewas? Are you sure that any filters, such as for allelic frequency, are not filtering out any or all of the data that you are providing it? If the data looks suitable for input and that format clearly works with treewas, then the issue may lie with treewas and not with the conversion script.

All the best, Sion