VirtualPhotonics / VTS

Virtual Tissue Simulator
https://virtualphotonics.org
Other
34 stars 9 forks source link

Reconfigure the detectors to use global code for the Binary Serializers #112

Closed lmalenfant closed 10 months ago

lmalenfant commented 1 year ago

The Detectors section of MonteCarlo has a lot of duplicated code and every time we add another detector, the code is duplicated more. The Binary Serializers could be consolidated for the detectors so this would reduce the code duplication.

dcuccia commented 1 year ago

We should agree on a design for this up front before implementing - IMO a good reason to have discussions that get promoted to issues.

lmalenfant commented 1 year ago

I agree but I already started a POC and needed to commit the code somewhere so I created a branch where I can try things out. This will probably take some time and I really need to try things out in code before coming up with a design. If you prefer I work on a fork, I can do that.

dcuccia commented 1 year ago

Fork could work, or, middle of the road. Could be to rename this ticket to something like "spike - experiment with decoupling detector serialization".

In my opinion, there's more refactoring to consider beyond serialization for detectors, that could make it easier to implement with a lower touch/surface area. This larger piece could interact with our serialization strategy.

I may have a little bit of time opening up to complete the initial python code and work on this with you on a spike branch where we intend to throw away the code and implement the final design we come up with.

dcuccia commented 11 months ago

Hi friends. Did a little thinking and have two (hopefully constructive) pieces of feedback on this ticket:

1) Per our live conversation on naming, I thought a little bit about the naming and namespace location of BinaryArraySerializerFactory. I think it's appropriate. It does what it says - it has one static factory method that creates BinaryArraySerializer(s). And in terms of discovery it's probably in the right place. See this justification, for example.

The alternative would be to put the CreateSerializer implementation into the BinaryArraySerializer class directly, which would be ok with me (but see caveat below). Many classes have factory creation methods. To lead developers into "the pit of success", they also make the class constructor private, so there's "only one way" to create the class (to minimize confusion/maximize correctness). If we did that, we'd probably want to clean up (remove) the public "setter" components of the BinaryArraySerializer, so that there's truly only one way they are instantiated - through the factory method - and mutation (source of bugs) is not allowed.

Caveat - as I've leaned into more "functional programming" modalities, I've tended to see the benefit of the segregation of constructed classes from the "helper" methods that allow their creation/manipulation. Think LINQ - all those Select, Where operators etc are defined in separate classes from the collection classes that "hold" the data.

Another note: I see that the Obsolete attribute warning on the BinaryArraySerializer.DataArray property member is not present on this new feature branch. I think it's important to remove (and remove assignments in all of the implementations), because it incorrectly exposes/guides improper use of the functionality, and is never used/needed anyway.

Lmk what y'all think - I can make any of the above changes if we want.

2) I think I figured out a relatively safe/extensible way to flatten/remove all the various "for loops" in the factory (487 lines of code down to 107). See my Gist here:

https://gist.github.com/dcuccia/552c52716211cfecaf48dd84fe254f8b

It doesn't change the surface area of the public API nothing in the detector code would change. Could make the factory method consolidation to BinaryArraySerializer a little more palatable, if we were going that direction. I haven't done much testing, but this should be a "good performance" version without any boxing or copying. Let me know what you think.

lmalenfant commented 11 months ago

@dcuccia thank you for thinking this through! Let me think about your suggestions and I will also take a look at your Gist when I have more time.

Another note: I see that the Obsolete attribute warning on the BinaryArraySerializer.DataArray property member is not present on this new feature branch. I think it's important to remove (and remove assignments in all of the implementations), because it incorrectly exposes/guides improper use of the functionality, and is never used/needed anyway.

Just a quick note on this, we didn't forget it, @hayakawa16 has made a note to add it back in once all the work on the detectors is complete because it is not obsolete until then and it will avoid many warnings on the unchanged detectors while the code is in progress. Good eye noticing that though :)

dcuccia commented 11 months ago

Cool, thanks on both accounts!

hayakawa16 commented 11 months ago

I added the Obsolete attribute to DataArray in BinaryArraySerializer and all unit tests run fine. I think that means I have updated all detectors? So outstanding items on this branch is potential modifications to BinaryArraySerializerFactory and possibly modifying the AllDetectorTests to use each detector class to write and read binary so that other binary (besides Mean and SecondMoment) get tested too (I think this was Lisa's suggestion?). I can work on latter.

dcuccia commented 11 months ago

Thanks Carole. I think there are still 53 references to BinaryArraySerializer.DataArray throughout the solution. Can we remove these obsolete assignments? (they're not used). For tests, should we create an integration test class for each named detector, to test the specific serialization expectations of that particular detector (and possibly share any boilerplate code needed between those tests)?

hayakawa16 commented 11 months ago

Hi, I did a Find on BinaryArraySerializer.DataArray and selected "Entire Solution" and the only one that is shown is in Vts.xml. How are you searching?

hayakawa16 commented 11 months ago

Are you looking at the feature/112-reconfigure-the-detectors-to-use-global-code-for-the-binary-serializers_2 (note "_2" at end) branch? This is the new branch Lisa and I made that picked out only those changes that support your implementation (other trial code was omitted).

dcuccia commented 11 months ago

I was looking at _2. Is this not the right branch?

hayakawa16 commented 11 months ago

Yes! How did you find those 53 references?

dcuccia commented 11 months ago

VS said there were 53 usages. On my way home and will try to repro there.

hayakawa16 commented 11 months ago

Thanks! Also have you pulled the latest? I've been busy updating detectors this week.

dcuccia commented 11 months ago

Yes, you have! <3

After rebuilding, it says it's actually 49 references:

image

Here's all the examples. These lines can be removed:

image
dcuccia commented 11 months ago

Oh, that's weird. Might be a Visual Studio cache issue...sorry stand by.

hayakawa16 commented 11 months ago

Wait, now I'm seeing them. Okay, let me work on those detectors. I kind of feel like I modified these detectors already. Well oh well. Deja vu all over again!

dcuccia commented 11 months ago

I cloned fresh and there were only 23, with a couple on the tests side:

image
hayakawa16 commented 11 months ago

Thanks for the helpful screen shot. I modified all of those files. The only place string "DataArray" shows up now is in Vts.xml and [Obsolete] DataArray in BinaryArraySerializer and BinaryArraySerializerFactory.
Unit tests pass and I pushed my fixes.

lmalenfant commented 11 months ago

This conversation reminded me that when I was working with the serialization code, it doesn't appear that Dimensions in BinaryArraySerializer is used either. @dcuccia & @hayakawa16 Could you you take a look and see if that is correct or if it is being used elsewhere?

hayakawa16 commented 11 months ago

I found no references to BinaryArraySerializer.Dimensions. Shall I add Obsolete attribute?

dcuccia commented 11 months ago

Good call - that is no longer needed, and was new as of this effort. Please delete it.

lmalenfant commented 11 months ago

@dcuccia it was not new, it's existing, I noticed it wasn't being used though when I first started the work on the detectors. We should probably put the obsolete attribute on it just in case it was being used externally and then remove it when we remove DataArray.

dcuccia commented 11 months ago

My mistake. I think I'd added a GetDimensions method and removed it, which is what had confused me. Sounds good.

hayakawa16 commented 11 months ago

I pushed the Obsolete attribute added to Dimensions modification.

hayakawa16 commented 11 months ago

I've been thinking about what you said concerning unit tests for this new code: For tests, should we create an integration test class for each named detector, to test the specific serialization expectations of that particular detector (and possibly share any boilerplate code needed between those tests)?

There is existing unit test code called AllDetectorTests. It gets a list of all detectorInputs from the sample input files (from geninfiles). It looks like this:

public void Verify_detector_classes()
{
         // use factory to instantiate detector with CreateDetector and call Initialize
         var tissue = new MultiLayerTissue();
         var rng = new MersenneTwister(0);
         foreach (var detectorInput in _detectorInputs)
         {
                // turn on TallySecondMoment so can flow through that code
                ((dynamic)detectorInput).TallySecondMoment = true;

                // factory generates IDetector using CreateDetector,
                // then calls detector.Initialize method 
                var detector = DetectorFactory.GetDetector(detectorInput, tissue, rng);
                // DetectorIO methods call detector.GetBinarySerializers.WriteData and ReadData
                // check that can write detector
                DetectorIO.WriteDetectorToFile(detector, "");
                // check that can read detector 
                var readDetector = DetectorIO.ReadDetectorFromFile(
                    detector.Name, "");
                Assert.IsInstanceOf<IDetector>(readDetector);
         }
}

I was thinking of extending this test to have each detector GetBinarySerializers tell me what arrays need to be written and then read back in for each detector. This is not what you were suggesting though.

When you say "integration" do you mean testing after running a simulation? If so, is this needed? And I think you are suggesting a separate unit test for each detector rather than this all-in-one test above? I will think on this.

dcuccia commented 11 months ago

Hi Carole - great thoughts. I have a wall of work this afternoon, but will try to respond tonight or tomorrow morning with some explanation and sample code.

hayakawa16 commented 11 months ago

Thanks for your help! Anytime is good.

dcuccia commented 11 months ago

My first thought is that there's a lot being tested all at once in these tests. Even if we were just testing a single detector, we'd be exercising:

  1. DetectorFactory.GetDetector
  2. DetectorIO.WriteDetectorToFile
  3. DetectorIO.ReadDetectorFromFile
  4. Tissues and RNGs that aren't relevant for the test

...and all the infrastructure beneath those items. Some would call this type of integration test an "end-to-end" test that tries to exercise the full "stack" of dependencies. And the only thing we confirm at the end is that the detector is not null.

The area of concern we are thinking to add, conversely, is to confirm that the specific properties of the detector (and new ones will have new properties...) are written and read correctly. Only the specific detector "knows" about this, so the specific detector code should probably have "it's own" corresponding test. There are two parts even to testing this smaller component:

a) if we want to test that the BinarySerializer code works along with the associated file IO, then we'd populate known numbers in a known/correct order (but they can be fake - don't need to be accurate from a physics perspective), create a BinarySerializer instance with a BinaryWriter and BinaryReader that are writing/reading to/from file, call the write action, and then read them back with the read action, and confirm the numbers are correct. This would be an "integration" test, because it's testing both the BinarySerializer code, and it's dependency, the file IO functions.

b) if we want to test that the BinarySerializer code works, independent of the file IO, then we'd populate known numbers in a known/correct order, create a BinarySerializer instance with a BinaryWriter and BinaryReader that are writing/reading to/from a "virtual" store like a MemoryStream, call the write action, and then read them back with the read action, and confirm the numbers are correct. This would be a "unit" test, because it's testing that functionality in isolation from any of its dependencies.

Generally, unit tests can be faster and more resilient, because they're testing only a single "layer" of the codebase. And, they often seek to minimize dependencies on IO or async operations where there can be "side effects" and other system-dependent reasons why the code may fail. But on the other hand, integration tests can "make sure" all the pieces work together. Usually, having a combination of them can be helpful - unit tests can give good coverage for each feature, and be used to define the expected behavior of that unit (the best documentation, IMO), and integration tests can be a good reality check. Maybe more computationally expensive (slower to run), and maybe more difficult to debug (because errors can happen anywhere in the stack), but they also build confidence of the entire system.

So, in terms of suggestion, perhapse we try to complement this nice end-to-end test with more isolated unit and integration tests, to verify individual detectors' expectations? I will write a couple examples and post them here.

dcuccia commented 11 months ago

At the lower level, here are two examples of how we could unit test each of the accepted BinaryArraySerializer input types:

[TestFixture]
public class BinaryArraySerializerTests
{
    [Test]
    public void Verify_BinaryArraySerializer_roundtrip_Complex1D()
    {
        // create a complex array and a serializer for it
        var dataArray = new Complex[] { (Complex)0, (Complex)1 , (Complex)2 };
        var serializer = BinaryArraySerializerFactory.GetSerializer(dataArray, "testc1d", "_testc1d");

        // write the array to a virtual stream
        using var stream = new MemoryStream();
        using var writer = new BinaryWriter(stream);
        serializer.WriteData(writer);

        // zero out the detector values
        Array.Clear(dataArray);

        // reset the stream to the beginning
        stream.Seek(0, SeekOrigin.Begin);

        // read the values back in
        var reader = new BinaryReader(stream);
        serializer.ReadData(reader);

        // check that the values are correct
        Assert.AreEqual((Complex)0, dataArray[0]);
        Assert.AreEqual((Complex)1, dataArray[1]);
        Assert.AreEqual((Complex)2, dataArray[2]);
    }

    [Test]
    public void Verify_BinaryArraySerializer_roundtrip_Double2D()
    {
        // create a complex array and a serializer for it
        var dataArray = new double[2,3]
            {
                { 0, 1, 2 },
                { 3, 4, 5 },
            };
        var serializer = BinaryArraySerializerFactory.GetSerializer(dataArray, "testd2d", "_testd2d");

        // write the array to a virtual stream
        using var stream = new MemoryStream();
        using var writer = new BinaryWriter(stream);
        serializer.WriteData(writer);

        // zero out the detector values
        Array.Clear(dataArray);

        // reset the stream to the beginning
        stream.Seek(0, SeekOrigin.Begin);

        // read the values back in
        serializer.ReadData(new BinaryReader(stream));

        // check that the values are correct
        Assert.AreEqual(0.0, dataArray[0, 0]);
        Assert.AreEqual(1.0, dataArray[0, 1]);
        Assert.AreEqual(2.0, dataArray[0, 2]);
        Assert.AreEqual(3.0, dataArray[1, 0]);
        Assert.AreEqual(4.0, dataArray[1, 1]);
        Assert.AreEqual(5.0, dataArray[1, 2]);
    }
}

Aside - I just noticed that "PopulateFromEnumerable" extension methods use the opposite indexing as the detector IO. It's worth checking where we use that method. Typically, the right-most dimension in a multi-dimensional array is written sequentially to disk, as that's "row major" format, and how it's handled in memory in .NET (see the double2d unit test above). PopulateFromEnumerable would be fine for use when copying from one array to another, but if it's used for reading/writing to disk, we should evaluate this discrepancy between that and this new code before merging.

hayakawa16 commented 11 months ago

Thank you very much for your analysis of the existing unit test and its drawbacks, and suggestions for added improvements!

I understand your comments about having tests that test just the single layer of the class itself. I noticed that you did not commit this code to test the BinaryArraySerializer class. I will see if I can implement into the branch next week.

Good comment about PopulateFromEnumerable. It looks like it is used by many classes. I guess the key is if those classes write/read to disk. One such class is ArrayCustomBinaryReader.ReadFromBinary and this in turn is called by FileIO.ReadFromBinary (this is not called by anything but its unit test) and DatabaseReader.ReadFromBinary (which uses it in FromFileInResources). I will perform an analysis to make sure this doesn't interfere. I'm somewhat confident all is okay because for all of the detectors, loadMCResults.m loads the binary in "column major" and that has been working.

Thank you for your efforts!

dcuccia commented 11 months ago

DetectorIOTests already has some decent tests for round-tripping the individual detector binary arrays. From a unit test perspective, these are really mostly testing the individual detectors, not whether DetectorIO functions are working. We could improve this by:

1) segregate a couple unit tests to check that DetectorIO.WriteDetectorToFile and DetectorIO.ReadDetectorFromFile are doing what they're supposed to (i.e. testing the internal logic, writing/reading files correctly, etc). This would be as simple as having one or two dummy detectors with different shapes to exercise the static methods. And then...

2) narrowing the individual detector tests to unit tests, without any code that depends on DetectorIO, and putting those tests in files named, for example FluenceOfRhoAndZDetectorTests.cs

For example, the original test written like this:

        [Test]
        public void validate_FluenceOfRhoAndZDetector_deserialized_class_is_correct_when_using_WriteReadDetectorToFile()
        {
            string detectorName = "testfluenceofrhoandz";
            IDetectorInput detectorInput = new FluenceOfRhoAndZDetectorInput()
            {
                Rho = new DoubleRange(0, 10, 3),
                Z = new DoubleRange(0, 1, 4),
                TallySecondMoment = false, // tally SecondMoment
                Name = detectorName,
            };
            var detector = (FluenceOfRhoAndZDetector)detectorInput.CreateDetector();
            detector.Mean = new double[,] {{1, 2, 3}, {4, 5, 6}};

            DetectorIO.WriteDetectorToFile(detector, "");
            var dcloned = (FluenceOfRhoAndZDetector)DetectorIO.ReadDetectorFromFile(detectorName, "");

            Assert.AreEqual(detectorName, dcloned.Name);
            Assert.AreEqual(1, dcloned.Mean[0, 0]);
            Assert.AreEqual(2, dcloned.Mean[0, 1]);
            Assert.AreEqual(3, dcloned.Mean[0, 2]);
            Assert.AreEqual(4, dcloned.Mean[1, 0]);
            Assert.AreEqual(5, dcloned.Mean[1, 1]);
            Assert.AreEqual(6, dcloned.Mean[1, 2]);
        }

...could be just slightly re-written like this:

[Test]
public void validate_deserialized_binary_arrays_are_correct()
{
    string detectorName = "testfluenceofrhoandz";
    var detector = new FluenceOfRhoAndZDetector()
    {
        Rho = new DoubleRange(0, 10, 3),
        Z = new DoubleRange(0, 1, 2),
        TallySecondMoment = false, // tally SecondMoment
        Name = detectorName,
    };
    detector.Mean = new double[,] { { 1, 2, 3 }, { 4, 5, 6 } };

    var serializers = detector.GetBinarySerializers();
    using var stream = new MemoryStream();
    using var writer = new BinaryWriter(stream);
    foreach (var serializer in serializers)
    {
        serializer.WriteData(writer);
    }

    Array.Clear(detector.Mean);
    Array.Clear(detector.SecondMoment);

    stream.Seek(0, SeekOrigin.Begin);

    using var reader = new BinaryReader(stream);
    foreach (var serializer in serializers)
    {
        serializer.ReadData(reader);
    }

    Assert.AreEqual(1, detector.Mean[0, 0]);
    Assert.AreEqual(2, detector.Mean[0, 1]);
    Assert.AreEqual(3, detector.Mean[0, 2]);
    Assert.AreEqual(4, detector.Mean[1, 0]);
    Assert.AreEqual(5, detector.Mean[1, 1]);
    Assert.AreEqual(6, detector.Mean[1, 2]);
}

Note that I also limited the test to using the detector class constructor itself, as opposed to relying on a different component (isolation, again).

hayakawa16 commented 11 months ago

I see your improvement! Wouldn't we need this type of unit test for each detector? (the unit test name change to omit FluenceOfRhoAndZDetector confused me).

dcuccia commented 11 months ago

In order to scale productivity for writing and maintaining those tests, we can create a DetectorBinarySerializationHelper class:

internal static class DetectorBinarySerializationHelper
{
    internal static void WriteClearAndReReadArrays(IDetector detector, params Array[] arrays)
    {
        var serializers = detector.GetBinarySerializers();
        using var stream = new MemoryStream();
        using var writer = new BinaryWriter(stream);
        foreach (var serializer in serializers)
        {
            serializer.WriteData(writer);
        }

        foreach (var array in arrays)
        {
            Array.Clear(array);
        }

        stream.Seek(0, SeekOrigin.Begin);

        using var reader = new BinaryReader(stream);
        foreach (var serializer in serializers)
        {
            serializer.ReadData(reader);
        }
    }
}

So that the specialized detector test becomes:

        [Test]
        public void validate_deserialized_binary_arrays_are_correct()
        {
            string detectorName = "testfluenceofrhoandz";
            var detector = new FluenceOfRhoAndZDetector()
            {
                Rho = new DoubleRange(0, 10, 3),
                Z = new DoubleRange(0, 1, 2),
                TallySecondMoment = true, // tally SecondMoment
                Name = detectorName,
            };
            detector.Mean = new double[,] { { 1, 2, 3 }, { 4, 5, 6 } };

            DetectorBinarySerializationHelper.WriteClearAndReReadArrays(detector, detector.Mean, detector.SecondMoment);

            Assert.AreEqual(1, detector.Mean[0, 0]);
            Assert.AreEqual(2, detector.Mean[0, 1]);
            Assert.AreEqual(3, detector.Mean[0, 2]);
            Assert.AreEqual(4, detector.Mean[1, 0]);
            Assert.AreEqual(5, detector.Mean[1, 1]);
            Assert.AreEqual(6, detector.Mean[1, 2]);
        }

In this way, we can share code as much as possible, while having detector-specific code handled by the author of that detector with the specific knowledge of it's expected shape and function.

Finally, as we've talked about, there are definitely more things we can do to use generics to "wrap" the functionality of standard configurations, like classes that only have Mean and StandardDeviation. E.g. we could have an IDetectorWithMeanAndStandardDeviation and then have standardized serialization and class testing based on that shape. But that gets back to the earlier question/concern/debate about complexity, and whether it's worth it. Someone writing and testing a class with that new structure would be fine, but someone doing something more custom wouldn't benefit from the grouping we'd done, and might be confused as to why they have to "jump off the cliff" to write and test their solution.

hayakawa16 commented 11 months ago

I understand your concept! The majority of detectors only have Mean and SecondMoment to de/serialize so that would work. For the other detectors, would we just add the other arrays to the "params Array[] arrays"? And then of course, add Assert code.

dcuccia commented 11 months ago

Yep, that's it exactly - each detector specific test will know which arrays to add to that params array. Sorry I wasn't replying in the middle, I was trying to get a couple posts out in sequence before heading to my niece's play.

Thanks for offering to do an analysis of the populate code. I think we should really decide going forward if we want to flip all the logic here. Probably much better if things are consistent, and keeping legacy compatibility is a big deal.

hayakawa16 commented 11 months ago

I rewrote the test and test title for FluenceOfRhoAndZDetector. It looks like this:

        public void Validate_FluenceOfRhoAndZDetector_deserialized_class_is_correct_when_using_GetBinarySerializers()
        {

            const string detectorName = "testfluenceofrhoandz";
            var detector = new FluenceOfRhoAndZDetector
            {
                Rho = new DoubleRange(0, 10, 3),
                Z = new DoubleRange(0, 1, 2),
                TallySecondMoment = true, // tally SecondMoment
                Name = detectorName,
                Mean = new double[,] { { 1, 2, 3 }, { 4, 5, 6 } },
                SecondMoment = new double[,] { { 7, 8, 9}, { 10, 11, 12 }}
            };

            DetectorBinarySerializationHelper.WriteClearAndReReadArrays(detector, detector.Mean, detector.SecondMoment);

            Assert.AreEqual(1, detector.Mean[0, 0]);
            Assert.AreEqual(2, detector.Mean[0, 1]);
            Assert.AreEqual(3, detector.Mean[0, 2]);
            Assert.AreEqual(4, detector.Mean[1, 0]);
            Assert.AreEqual(5, detector.Mean[1, 1]);
            Assert.AreEqual(6, detector.Mean[1, 2]);
            Assert.AreEqual(7, detector.SecondMoment[0, 0]);
            Assert.AreEqual(8, detector.SecondMoment[0, 1]);
            Assert.AreEqual(9, detector.SecondMoment[0, 2]);
            Assert.AreEqual(10, detector.SecondMoment[1, 0]);
            Assert.AreEqual(11, detector.SecondMoment[1, 1]);
            Assert.AreEqual(12, detector.SecondMoment[1, 2]);
        }

I created the class DetectorBinarySerializationHelper per the post above. I wasn't sure where to put it. Currently it is in the Detectors folder, but I could see arguments for it being in the Helpers folder or the IO folder too. Let me know your ideas.

If the above code, looks correct, I will continue on with the other unit tests.

dcuccia commented 11 months ago

Hi Carole,

I will have time later to check out the code, but I find this type of code is useful to have as a test-only library helper (it's not relevant for the production code, just the tests. It could be in a Helpers folder in the test library, or if multiple test libraries need access to it, create a Vts.Test.Common or similar library.

In terms of location, this would be testing the serialization of that specific class, so I assume we want a 1:1 relationship between the .cs test class and the detector (1 for each). This item unit tests the serialization, another could unit test the accuracy, etc.

lmalenfant commented 11 months ago

@hayakawa16 when I create helper methods for unit testing, I usually create a separate folder called helpers that doesn't have a matching folder in the project being tested. You could put that in the MonteCarlo folder or at the folder level of the Detectors depending on where it will be used.

I also wanted to mention something here about the tests in the VTS. When we were adding test coverage we noticed that the majority of the tests were in fact integration tests and not true unit tests. Where possible, I added unit tests to increase the code coverage but in some case it wasn't possible due to way the original code was written, static classes and methods etc. It would be good once we have the coverage where we want it to be, to identify the areas where we are missing unit tests and add them.

This might be a good time to come up with a structure for the test application so we can easily differentiate unit tests and integration tests. In my projects I name the unit test classes the same as the class being unit tested and then I put the integration tests in a separate location under an IntegrationTests folder because they usually include many classes. Any suggestions for this would be welcome.

dcuccia commented 11 months ago

Great points Lisa. At Modulim, we go as far as to have separate libraries, e.g. Vts.Test.Unit and Vts.Test.Integration. We run the Unit tests in CI for every branch commit, and run the Integration tests once a (non-draft) PR is created.

lmalenfant commented 11 months ago

@dcuccia I like that idea! I was going to say separate projects but I didn't know if that was overkill. The fact that you used it at Modulim is good because it sounds like it worked well.

dcuccia commented 11 months ago

It does. The other way to do it is to flag tests/test classes as integration or unit or e2e, and use a command line switch to run one or the other. We have some very slow running tests that depend on an outside service that we only run before we build a production release candidate, and we use flags for those.

hayakawa16 commented 11 months ago

I like these ideas. I have more nuts and bolts questions: 1) for the single value detectors (e.g. ATotalDetector), GetBinarySerializers() => null; is in code. So I plan to leave those tests as they are? 2) I put DetectorBinarySerializerHelper for now in Vts.Test/MonteCarlo/IO/. We may want to move around later. I just needed it somewhere to update the existing unit tests. 3) There are currently 18 detectors in DetectorIOTests. I have 54 more tests to add. Can I go ahead and do this and after that we can move things around based on "unit" vs "integration" tests? Or would you rather I do that before I fill out these tests? 4) These updated 18 unit tests found an error in the way that the new code in GetBinarySerializer was written in some detectors. So that is good! Turns out you need to define the SecondMoment arrays without checking if TallySecondMoment is true, i.e. for ReflectedMTOfRhoAndSubregionHistDetector:

Mean ??= new double[Rho.Count - 1, MTBins.Count - 1];
SecondMoment ??= new double[Rho.Count - 1, MTBins.Count - 1];
var allSerializers = ...

rather than

Mean ??= new double[Rho.Count - 1, MTBins.Count - 1];
if (TallySecondMoment)
{
  SecondMoment ??= new double[Rho.Count - 1, MTBins.Count - 1];
}
var allSerializers =...

Otherwise the SecondMoment array was not in the serializers list in WriteClearAndReReadArrays in DetectorBinarySerializerHelper.

I have pushed code in case you'd like to give it a look.

dcuccia commented 11 months ago

Defer to Lisa on 2 & 3.

IMO for 1, they should be returning an empty array, not null, and I'd make sure to "move" those tests to the same format as all others (testing serializers is just one part of unit testing the detector class - you'll also want to test correctness, etc).

Re: 4, I don't understand - why does SecondMoment need to be allocated, if it's never used? That said, a "fully-initialized" class is always preferable from a code reliability perspective, I just thought we'd kept those null so that we could use all the available memory for the main detector, if that was a limitation. FWIW, if we intend to keep them possibly null, we should mark with the nullable operator (i.e. double[,]? SecondMoment).

hayakawa16 commented 11 months ago

Regarding 4, you're absolutely correct. When I was debugging code and stepping through, SecondMoment wasn't getting set somehow. I reverted code and all is good now.

hayakawa16 commented 11 months ago

Regarding 1, the single valued detectors write their Mean and SecondMoment to the detector.txt ascii file, there is no binary serialization going on. Do you still feel same way?

dcuccia commented 11 months ago

Right - that's always been the case. The test, in that case, should just confirm the expectation - that GetBinarySerializers() returns an empty, non-null array.

lmalenfant commented 11 months ago

I put DetectorBinarySerializerHelper for now in Vts.Test/MonteCarlo/IO/. We may want to move around later. I just needed it somewhere to update the existing unit tests.

I think that's a good place

There are currently 18 detectors in DetectorIOTests. I have 54 more tests to add. Can I go ahead and do this and after that we can move things around based on "unit" vs "integration" tests? Or would you rather I do that before I fill out these tests?

Yes add them now and we can figure it out later, try to keep integration and unit tests separate even if it is just with a comment in the same file

lmalenfant commented 11 months ago

I agree with @dcuccia for 1. returning empty array vs null but if this is too much to change now (this is prevalent in our code and needs resolving globally) we can leave it as is and fix later.

hayakawa16 commented 11 months ago

Thanks for your feedback! I will continue along adding unit tests then.