Closed jakebeal closed 2 years ago
I think this is the spot: https://github.com/SynBioDex/pySBOL3/blob/5ae15b4f171991b3cd216b7548ffde7902f41c12/sbol3/document.py#L464
And following the template in @kenza12's pull request, I think we can fix by adding newline='\n'
to the arguments.
Note: I am considering this a bug, rather than an appropriate difference between Windows and Mac/Linux, because it will generates differences in version control where there should be none.
Now I am confused, because my attempt to replicate this failure is not working: the Windows test that I set up to fail if CRLF is being written is instead passing: https://github.com/SynBioDex/pySBOL3/runs/6369161798?check_suite_focus=true
@kenza12: can you reproduce a minimal example of CRLF being written in your Windows environment?
@jakebeal I think this is working for you because you have set up your test to use universal newlines. Read the documentation for the newline
option to open.
The test in sbol_factory uses filecmp.cmp
which seems to care about the size of the file in bytes, which would be different when there are different line endings.
On to this issue:
Why are different line endings on different platforms considered a bug? My read of the spec is that a valid N-Triples file can have any combination of CR and LF line endings. I think we should focus on the files carrying the same semantic payload, and not on the details of what line endings appear.
I think the sbol_factory test cases should not use filecmp.cmp
for checking the results of round tripping. pySBOL3 loads the RDF files in RDFlib and diffs the triples that got loaded, rather than worrying about a character by character difference.
My key question is this: what will git do? If universal newlines means that git won't see a change in newlines, then I'm fine with it.
Hi, Thank you for your feedback. I didn't encounter any bug with pysbol3 tests as you can see here : https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=491021&view=logs&jobId=240f1fee-52bc-5498-a14a-8361bde76ba0&j=240f1fee-52bc-5498-a14a-8361bde76ba0&t=06d62fd2-ab49-5add-9ee5-83b6882d7d2b
All the tests passed, even for test_write function in test_Document.py : https://github.com/SynBioDex/pySBOL3/blob/5ae15b4f171991b3cd216b7548ffde7902f41c12/test/test_document.py.
The bug in sbol_factory tests is appearing because of this file comparison where the assertion failed in https://github.com/SynBioDex/sbol_factory/blob/master/test/test_ontology.py as @tcmitchell mentionned.
assert filecmp.cmp(temp_name, original_file), "Files are not identical"
I think you can either change the comparison way or specify the newline.
@jakebeal said:
My key question is this: what will git do? If universal newlines means that git won't see a change in newlines, then I'm fine with it.
I don't understand what git has to do with it. git would definitely see a change in newlines if git were involved. That's the nature of things on different platforms. But git is not involved. We shouldn't be worried about generating byte-for-byte identical files, we should be worried about generating files that conform to their specification and that carry the same content.
There are flags in git to handle line endings across platforms, and GitHub has a good page about it. To me that's orthogonal to what this issue is about.
In short, I believe that N-Triples files, whether sorted or not, should have CRLF on Windows.
Per discussion, this isn't actually a bug, but is the correct behavior. I do think that we should document the potential challenge and solution in an easy place to find, and to that end have added a documentation issue on SBOL-utilities.
@kenza12 has created a pull request on SBOL-factory patching a test where Windows is writing files with a CRLF rather than just a newline: https://github.com/SynBioDex/sbol_factory/pull/70
It seems to me that this patch is actually indicating a bug in pySBOL3, which shouldn't be writing CRLF in the first place.