JMSLab / Template

Template for research repository using scons.
9 stars 1 forks source link

Allow for different logfile encodings #76

Closed rcalvo12 closed 1 year ago

rcalvo12 commented 1 year ago

If the sconscript.log file has characters that are not in "utf-8", scons breaks. This is particularly a problem when parallelizing the build. A simple fix would be allowing the builder to try again with "latin1" if it doesn't read "utf-8" on the first pass.

rcalvo12 commented 1 year ago

@jmshapir, I added the latin1 compatibility in https://github.com/JMSLab/Template/commit/731585662a95d4dff601e8a852d3bc4c09c4115e, did we want to add a unit test for this? I started considering how we would test, but have not gone through with implementing anything.

jmshapir commented 1 year ago

@rcalvo12 thanks!

It's probably good to add a unit test. Maybe we could try to add an input in tests/input that we know would induce an error with the default encoding, and test that no error is thrown?

rcalvo12 commented 1 year ago

@jmshapir, I did some more digging this weekend and a little this morning. As I had seaid in our offline conversation, I was unable to reproduce the error locally. Going back to the project which first prompted this issue, I realized that even in the original case the error does not occur locally. The error is only introduced when I try to run the original version of the builder on FASRC. This presents a problem in that unit tests will only catch that it is working properly if the unit test is done on FASRC.

Given this, do you think it might be better to put this issue aside for now and not merging this change? We now have this issue and branch publicly documenting the problem and the workaround in case future users encounter similar issues on their cluster systems.

jmshapir commented 1 year ago

@rcalvo12 thanks for looking into this!

I agree with your conclusion. We are only aware of one occurrence of the issue and we cannot reproduce it on other architectures. To me that says that a universal fix is likely to have a cost (in terms of maintenance/clarity) that is higher than its benefit.

I suggest we do as you say, close this issue with a stable link to the state of the code in the issue branch as of the closure, and delete the branch. If the issue recurs or we hear about it from other users, we can always revisit and base our solution off of the fix implemented on the branch here.

rcalvo12 commented 1 year ago

@rcalvo12 thanks for looking into this!

I agree with your conclusion. We are only aware of one occurrence of the issue and we cannot reproduce it on other architectures. To me that says that a universal fix is likely to have a cost (in terms of maintenance/clarity) that is higher than its benefit.

I suggest we do as you say, close this issue with a stable link to the state of the code in the issue branch as of the closure, and delete the branch. If the issue recurs or we hear about it from other users, we can always revisit and base our solution off of the fix implemented on the branch here.

Thanks @jmshapir! I will wrap this up shortly.

rcalvo12 commented 1 year ago

Summary:

In this issue, we set out to patch an issue which had occurred in another project repository. However, we were unable to reproduce the error locally and after further investigation it seems like the error is an idiosyncratic issue that only appears one dataset and only on FASRC.

Given the lack of reproducibility we decided in https://github.com/JMSLab/Template/issues/76#issuecomment-1629891633 to go ahead and close the issue without merging.

Final state of the issue branch here.