Open hkaspersen opened 4 months ago
It seems from the documentation that the core_gene_alignment.aln
should be based on the genes included within the parameters specified with --core_threshold
, as defined here. Perhaps this is an issue with the discrepancy between the stats file and the output?
@hkaspersen do we have info on where @karinlag has done the tests and command that was run ? best to try to reproduce and find the error with these data maybe ?
@evezeyl yes that sounds like something we need to test. @karinlag could you provide us the data so we can replicate?
Reply from Gerry on Panaroo:
The --core_threshold parameter controls what goes into the core alignment. We have also added additional filtering steps which generates the core_gene_alignment_filtered.aln file which generally gives much better phylogenies if that is what you're after. This filtering can be controlled with the --core_entropy_filter parameter. I think deciding what constitutes a core gene is often very context-specific. For example, if you're just building a tree, then an arbitrary threshold might be fine. However, if you're trying to determine whether a gene is found in all genomes of a species, this is likely to be insufficient.
Do we know what this means yet? I am not sure I am able to parse what this translates into for our context.
@evezeyl the things I was inputing here was Rikkis plasmids. They are behaving very oddly. A progressiveMauve aln shows that they are very similiar (identical) over large regions, but still parsnp says they are too diverse, and panaroo says 0 shared genes. No idea what is happening.
@hkaspersen where are we here? Have we moved forward on this yet?
@karinlag detected that there were no correspondence between the detected core genes and the resulting alignment from Panaroo. An alignment was produced even though zero core genes were identified. We therefore have to figure out what actually goes into the alignment. We have asked Gerry on the µbioinfo slack about this.