Closed wwood closed 4 years ago
Check this line in the code, it should be clear from there.
Essentially it checks the ratio of shared genome length vs. the length of the smaller of two genomes being compared. In older versions, FastANI had an absolute cutoff on shared genome length before trusting the ANI value, which was not good as the cutoff value should ideally depend on genome lengths.
So shared over length of smallest genome, rather than shared over length of smallest genome's hashed fragments, is that right?. So if a significant portion of the smallest genome is in small contigs, a significant portion cannot count in the numerator but is included in the denominator. Does that make sense? Should it be changed somehow?
I think I understand your point. Are you able to attach the fna files here? I'll take a look.
I've sent the MAGs to you over email just now.
Apologies for the delay in my response. I've revised the master branch to fix this. It is now working on this example.
$EXE -q MAG52.fna -r MAG189.fna -o /dev/stdout --minFraction 0.5
MAG52.fna MAG189.fna 97.4762 228 629
$EXE -r MAG52.fna -q 189.fna -o /dev/stdout --minFraction 0.5
MAG189.fna MAG52.fna 98.351 232 255
Thanks. You can use the genomes as public test cases if that is helpful, as galah has done.
From: Chirag Jain notifications@github.com Sent: Thursday, August 20, 2020 3:22:41 PM To: ParBLiSS/FastANI FastANI@noreply.github.com Cc: Ben J Woodcroft donttrustben@gmail.com; Author author@noreply.github.com Subject: Re: [ParBLiSS/FastANI] --minFraction too stringent for fragmented genomes? (#70)
Apologies for the delay in my response. I've revised the master branch to fix this. It is now working on this example.
$EXE -q MAG52.fna -r MAG189.fna -o /dev/stdout --minFraction 0.5 BE_RX_R2_MAG52.fna BE_RX_R3_MAG189.fna 97.4762 228 629 $EXE -r MAG52.fna -q 189.fna -o /dev/stdout --minFraction 0.5 BE_RX_R3_MAG189.fna BE_RX_R2_MAG52.fna 98.351 232 255
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/ParBLiSS/FastANI/issues/70#issuecomment-677117909, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAADX5B7QRLIYROIXQNO7JTSBSXKDANCNFSM4O7A657Q.
Hi, would it be possible to add this fix to a release?
Done.
Hi, thanks a report by @apcamargo at https://github.com/wwood/galah/issues/7 I came across an issue with
--minFraction
on these fragmented genomes. They seem to align well:But when
--minFraction
is used the hit goes away. This is even though 232/255 > 0.5:Filtering out this alignment by the minFraction seems incorrect to me. I wonder what the definition of the minFraction actually is. Is it the fraction of the total genome length or the fraction of the genome that is long enough to be included as a fragment, or something along those lines?
Thanks, ben