MrTomRod / scoary-2

Calculate assocations between genes and traits
MIT License
18 stars 1 forks source link

missing values in input #14

Open cmetadea opened 1 month ago

cmetadea commented 1 month ago

Hi, if I have a missing value in my roary input and want to use it as scoary input, can I replace it with NA? Will it affect the results of scoary? thanks

MrTomRod commented 1 month ago

Hey @cmetadea

This is very strange! I assumed that this can't happen. Is this output from regular Roary? Can you show me the file?

To your question: I think Scoary2 will fail, or just assume the N/A genes are absent. Scoary2 can handle N/As in traits, but not in genes.

Best, Thomas

#Edit: For future reference, Scoary2 assumes N/A's mean that a gene is absent. See the relevant code here.

>>> print(df)
     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  7.0  11.0
3  4.0  8.0   NaN
4  5.0  NaN  13.0
>>> df >= 1
       A      B      C
0   True   True   True
1   True  False   True
2  False   True   True
3   True   True  False
4   True  False   True
cmetadea commented 1 month ago

Hi, it's the output from panaroo (they can output a roary lookalike). Can I email you the file instead? Also, I'll try to run it with roary. I was looking for newer ones since roary is not maintained anymore. Thanks

MrTomRod commented 1 month ago

I think Roary is fine, or OrthoFinder.

You can find my email here. Won't have time to look at the file today, though.