fwhelan / coinfinder

A tool for the identification of coincident (associating and dissociating) genes in pangenomes.
GNU General Public License v3.0
95 stars 10 forks source link

header of _pairs.tsv? #34

Closed ramiroricardo closed 3 years ago

ramiroricardo commented 3 years ago

Hi Fiona,

Thanks for developing such a cool tool.

A simple question, where can I find the exact meaning of each of the columns in the _pairs.tsv files?

Thanks

fwhelan commented 3 years ago

Hi ramiroricardo, Ah very good question. I can't remember if this is described in the publication but I don't think so. I'll add this to the readme later today!

Source : first gene's name Target: second gene's name p: p-value (corrected if correction applied) to the interaction Avg synthetic distance: a blank column of 0s; this is an artifact of when I was trying to use information in the roary output to calculate the average distance between the 2 genes. I will remove this in my next commit. successes: # of times the 2 genes are found together (or apart if using -d) observations: # of genomes 1 or both genes are found in rate: observed rate of association/dissociation of the genes expected: expected rate of association/dissociation of the genes total source: total # of genomes the first gene is observed in total target: ditto for the second gene fraction source: total source / total # of genomes in the dataset fraction target: ditto for total target

Let me know if any of the above isn't clear. I'll leave this issue open until I update the readme with this information and make the appropriate commits.

Thanks for your question! --Fiona

ramiroricardo commented 3 years ago

HI Fiona,

Thanks a lot for the quick reply.

All clearto me, except rate. I understand it is the observed rate, but from looking at the paper I would expect this to be a count, though all the values I have range from 0 to 1. This is either for dissociation or association in a test dataset with ~600 genomes. Is rate standardized in some way?

Thanks

fwhelan commented 3 years ago

Hi ramiroricardo,

You're correct, I apologize. What is currently output as rate is (using the annotation from the manuscript) EA(ij) = Pi ∗Pj. There was not real reason for this; I'll commit code now to fix this to match the manuscript as not to cause confusion.

fwhelan commented 3 years ago

Apologies- I misspoke in my initial response: rate: Pi*Pj expected: Pi*Pj*N = EA(ij) successes: Nij = OA(ij)

Outputting rate isn't really necessary or helpful to the user and is confusing, so I'll remove it now.

ramiroricardo commented 3 years ago

Hi Fiona,

Thanks a lot for help. This makes things clearer