NorwegianVeterinaryInstitute / ALPPACA

A tooL for Prokaryotic Phylogeny And Clustering Analysis
BSD 3-Clause "New" or "Revised" License
10 stars 3 forks source link

Panaroo core gene alignment issue #120

Open hkaspersen opened 4 months ago

hkaspersen commented 4 months ago

@karinlag detected that there were no correspondence between the detected core genes and the resulting alignment from Panaroo. An alignment was produced even though zero core genes were identified. We therefore have to figure out what actually goes into the alignment. We have asked Gerry on the µbioinfo slack about this.

hkaspersen commented 4 months ago

It seems from the documentation that the core_gene_alignment.aln should be based on the genes included within the parameters specified with --core_threshold, as defined here. Perhaps this is an issue with the discrepancy between the stats file and the output?

evezeyl commented 4 months ago

@hkaspersen do we have info on where @karinlag has done the tests and command that was run ? best to try to reproduce and find the error with these data maybe ?

hkaspersen commented 4 months ago

@evezeyl yes that sounds like something we need to test. @karinlag could you provide us the data so we can replicate?

hkaspersen commented 4 months ago

Reply from Gerry on Panaroo:

The --core_threshold parameter controls what goes into the core alignment. We have also added additional filtering steps which generates the core_gene_alignment_filtered.aln file which generally gives much better phylogenies if that is what you're after. This filtering can be controlled with the --core_entropy_filter parameter. I think deciding what constitutes a core gene is often very context-specific. For example, if you're just building a tree, then an arbitrary threshold might be fine. However, if you're trying to determine whether a gene is found in all genomes of a species, this is likely to be insufficient.

karinlag commented 4 months ago

Do we know what this means yet? I am not sure I am able to parse what this translates into for our context.

@evezeyl the things I was inputing here was Rikkis plasmids. They are behaving very oddly. A progressiveMauve aln shows that they are very similiar (identical) over large regions, but still parsnp says they are too diverse, and panaroo says 0 shared genes. No idea what is happening.

karinlag commented 12 hours ago

@hkaspersen where are we here? Have we moved forward on this yet?