Proposed fix for issue #20 test with blank_line.las

dcslagel commented 4 years ago

@JustinGOSSES,

This is a proposed fix for issue #20 test with blank_line.las. It passes 'test-all' and 'test-las2json.

The change refactors building curve headers per 2.0 spec. The LAS_20_Update_Jan2014.pdf of the LAS specifications says the ~CURVE INFORMATION section must have curves (curve names) in the order they (the data columns) appear in the ~ASCII log data section. Given that interpretation, I removed the other options for building the Curve name array and just rely on getting the names from the CURVE INFORMATION BLOCK.

Reference: http://www.cwls.org/wp-content/uploads/2014/09/LAS_20_Update_Jan2014.pdf https://github.com/kinverarity1/lasio/blob/master/standards/LAS_20_Update_Jan2014.pdf

If it is desirable to have the fall back options of getting the names from the ~ASCII section (if they are there) and lastly generating a set of generic column headers, let me know and I'll add them back in.

Let me know if this change could be accepted (or rejected) or needs some additional changes before merging.

Thanks,

DC

JustinGOSSES commented 4 years ago

Curve names in the curve block being in different order from the curve names in curve data, or absent, seems like a problem that is likely to occur. It shouldn't occur if people followed the LAS guidelines exactly, but they don't so have to handle some of those LAS inaccuracies.

I'm relunctant to have code that returns to the user curve data under the wrong curve name, which I think will happen with the code in this pull request if I'm understanding it correctly? I'd prefer the code either be able to handle these types of situations correctly or recognize it and fail with appropriate error or warning.

Therefore, I'd prefer to leave in the longer more complicated code and not accepting this pull request as is.... but definitely willing to listen to counter arguments if you feel strongly about it.

dcslagel commented 4 years ago

@JustinGOSSES,

Yeah, that's the feedback this change needed. It sounds like there is a possibility of tweaking this branch to make it better than the current production implementation. Would the following logic meet the need:

Check ~A for column names and use those if found
Else if no ~A col names, then check Code Information Block for column names and use those if found
Else use generic column names : col-1, col-2, col-3,...

Or is there another algorithm option that would be good?

Thanks,

DC

kinverarity1 commented 4 years ago

Hey - thought I'd throw my 2 cents in, in the interest of keeping wellio.js and lasio behaviour consistent!

I've found many of the LAS files in our archive have inconsistent mnemonics in this way - generally the names in the ~A are truncated to the width of the numeric column. Which can lead to duplicate names in the ~A set whereas those in ~C are unique. I've found it a nightmare to manage.

So... I've stuck with only using the names in ~C, as they are very very rarely absent. My only intention for the names in ~A was to provide an argument for the user to provide which would cause the parser to warn (or fail) if the ~C and ~A curve names differ (haven't actually done it yet though)

kinverarity1 commented 4 years ago

I'm curious though: do you find cases where curves have been reordered, without changing the ~C name order?

JustinGOSSES commented 4 years ago

@kinverarity1 Less that I've found lots of cases of it and more that I am assuming that some people making LAS would make that mistake.

Thanks for adding your experience!

@dcslagel let's follow the lasio approach

dcslagel commented 4 years ago

@JustinGOSSES, @kinverarity1

Apologies, I am not sure I understand if and what further modifications are needed.

Since the current proposed change matches Lasio's approach of using only the ~C content for the curve/data column headers and this change passes the current set of tests, is there any additional modification that should be made in this proposed change?

Let me know if this change could be accepted (or rejected) or needs some additional changes before merging.

Thanks,

DC

Sidenote: My thinking when developing this pull request is that it wouldn't be the full robust processing of the data column headers (mnemonics). It would be a step towards that state. Adding subsequent related test cases and the needed code changes to pass each new test case (and keep the test suite passing) will iteratively build to a full robust processing of the data column headers. Is that your thinking also or something else?

JustinGOSSES / wellio.js

Proposed fix for issue #20 test with blank_line.las #25