fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
205 stars 54 forks source link

[BUG] documentation issue: Statistical Testing / SAM #273

Closed smoothdeveloper closed 11 months ago

smoothdeveloper commented 12 months ago

I'd like to propose some fixes to this piece of the documentation (broken link, image, missing data, maybe some terms don't fit the image anymore).

https://fslab.org/FSharp.Stats/Testing.html#SAM

My main problem is that there is no TestDataSAM.txt.

@zieglerse, wondering if by chance you have a copy of this file?

zieglerSe commented 12 months ago

Of course! The TestDataSAM.txt file is located in FSharp.Stats\tests\FSharp.Stats.Tests\data\TestDataSAM.txt. Also, you can find a copy of it attached here: TestDataSAM.txt

smoothdeveloper commented 12 months ago

@zieglerSe sorry for having missed it in the repository and thanks for the other PR to adjust the docs, I'll be able to proceed if I see further adjustments to suggest.

In #272 where I'm suggesting some changes in the overall code (not SAM only), there is seemingly a missing check for -infinity in the implementation of SAM, could you confirm if this is an oversight in which case I'll try to adjust a test case that would cover it and add the check for -infinity.

Thanks!

zieglerSe commented 12 months ago

@smoothdeveloper no problem!

I saw #272 and checked it - the missing check for -infinity was an oversight. Feel free to add the changes, otherwise i'll take care of it!

smoothdeveloper commented 12 months ago

I'll get to it unless it is breaking anyone right now or is urgent, as I'd like to just grasp the thing some more and get familiar with how tests are done in the repository.

smoothdeveloper commented 11 months ago

A small note about the SAM record, it has the ID field hardcoded to string but I believe we'd want to make this a generic type parameter (that is IComparable).

I've not looked how it would impact client code, but unless there is a "ID" of items we put in SAM is settled to be a string, always, I think it is worth adjusting to make it generic.

The main caveat is, people who haven't used statically typed languages, with generics, may find this a bit "head scratching" at first.

If there is interest, I'll make separate issue and check the outcome of making this record field generic.

bvenn commented 11 months ago

I agree, that a generic type for the ID field would be beneficial. The only draw back I can see is, that the type would contain a generic type annotation which makes the type a little bit more complex but the benefit definetely greater.

Testing.SAM.SAMResult becomes Testing.SAM.SamResult<'a>

While in most cases the identifier would be a UniProt accession number (string) or some other kind of domain-id, it definetely is possible to have e.g. BioFSharp.IO.FastA.FastaItem<BioArray<Nucleotide>> as bioitem identifier.

If you want, you can take care of it, or I can quickly modify the SAM type🚀

smoothdeveloper commented 11 months ago

@bvenn, I'll make a PR for the ID field so we can review the impact, you are right, it bubbles to SAMResult as well. Thanks for the feedback!

smoothdeveloper commented 11 months ago

Closing the issue as the fixed documentation has made it online :)