Closed surilindur closed 3 months ago
Totals | |
---|---|
Change from base Build 8938887278: | 0.0% |
Covered Lines: | 112 |
Relevant Lines: | 112 |
To be honest, I don't remember why we had these two passes 😅 It may just be a historical thing that is not necessary anymore. So if you notice that experiments still run fine with a single pass, I'm definitely open to this.
Just let me know when this is ready for review. Will be a breaking change of course.
So if you notice that experiments still run fine with a single pass, I'm definitely open to this.
The experiments should run fine, I just double-checked and the output is identical when the phases are combined (file hash-wise, and the files with different hashes - namely WebIDs - just have the triples in a different order).
When this is released, it would require updating the jbr experiment configurations, though, but I also have a branch to do that. I will look into making a PR for it later.
This should be ready for review now!
Thanks!
Do let me know if no more changes are needed and a new (major) update should be released.
I think it might make sense to wait for the multiple interfaces work before releasing the next major. I will try to make a PR later this week.
This is an idea I had while working on the multiple interfaces thing, to combine the main and auxiliary fragmentations into one pass. The auxiliary phase is only used to add some noise, and the configs are mostly identical, so it feels like they could be combined.
This would make it easier to, for example, have fragmentation strategies and sinks defined in the main config that would apply to both the main and auxiliary data together, and would remove any challenges resulting from having two different passes potentially writing in the same files or trying to gather a summary of the data and missing out the auxiliary or primary data as a whole.
Any thoughts are welcome, I do not know if this is acceptable from some other points of view or not. It was never clear to me why the auxiliary phase was separate, since in practice it probably does not need to be (the way it is implemented now).