Closed paulmillar closed 2 years ago
Hi!
Thanks for your MR!
It seems we had similar ideas :) Support for this was merged yesterday internally in the development
branch with support for multiple NULL strings.
It will be available soon in the next release! In the meantime you can try it from the development
branch on Github.
Please let us know if that fixes the issue you're having ;)
Indeed, it's certainly encouraging that we had similar ideas. I'll definitely need to switch to the development
branch!
Also, I noticed that my patch contains a mistake: the code looks for csvw:null
under the csvw:Dialect
, rather than the csvw:Table
:see_no_evil:
My use-case is actually different. I would like to suppress specific assertions associated with the cell containing the null String value.
Here is an example:
"id";"acronym";"status";"nature";"rcn"
"824064";"ESCAPE";"SIGNED";"";"219246"
The first row is the header, the second row is the actual data. In this data, not all rows have a nature
field and, for those with an empty string, I would like RMLMapper to suppress the corresponding assertions, rather than making assertions using the empty string value.
However, I do still want to generate an instance for this row, with all the non-empty cells contributing assertions.
Commit d14ac9d4 does something different. It suppresses the entire row if any of the cells contains the csvw:null
value.
I can imagine this could be useful under certain circumstances; however, it's a different use-case from mine.
Going back to the definition of csvw:null
:
An atomic property giving the string or strings used for null values within the data. If the string value of the cell is equal to any one of these values, the cell value is
null
.
Note it says cell value
here. It also doesn't say what semantics a null cell value carries. Therefore, I'd say that RMLMapper is free to react to a cell value being null
in whichever way it chooses.
So, perhaps this behaviour could be configured?
For example, RMLMapper could to null
cell values by rejecting the specific assertions (my use-case), or by rejecting the entire row (your use-case, I guess).
Would that sound reasonable approach?
@paulmillar Oh good catch!
However, I do still want to generate an instance for this row, with all the non-empty cells contributing assertions.
Commit d14ac9d does something different. It suppresses the entire row if any of the cells contains the
csvw:null
value.I can imagine this could be useful under certain circumstances; however, it's a different use-case from mine.
Going back to the definition of
csvw:null
:An atomic property giving the string or strings used for null values within the data. If the string value of the cell is equal to any one of these values, the cell value is
null
.Note it says
cell value
here. It also doesn't say what semantics a null cell value carries. Therefore, I'd say that RMLMapper is free to react to a cell value beingnull
in whichever way it chooses.So, perhaps this behaviour could be configured?
This doesn't need to be configured as the behavior doesn't properly match what was intended with csvw:null
:)
It should behave like you said: ignore the cell instead of the whole row.
Would you like to hack on this as you have a nice use case? Or do I make an internal issue so that somebody can have a look?
Hmmm...
I don't know if I'm missing something here but, looking at functional test RMLTC1002a_null-CSVW, it seems like skipping the row is the behaviour @winniederidder intended.
Perhaps we should come up with a consensus view on what effect csvw:null
should have, just to make sure everyone's happy.
In terms of hacking on this, yes, I'd be happy to; however, I can only working on this in my spare time, which is (at the moment) very limited and unpredictable. So, I wouldn't want to promise anything!
@DylanVanAssche Was the intended behaviour not ignoring the entire row? Else the column containing a null value should be set to null, which of course isn't too difficult and can easily be added. But ignoring is how I understood the original issue atleast.
@winniederidder I misread the original issue when checking the MR.
Else the column containing a null value should be set to null, which of course isn't too difficult and can easily be added.
Lets change the behavior into this :)
We can use withNullString()
to set the NULL string to the first value of csvw:null
and before processing the CSV file, we replace all other possible NULL values provided by csvw:null
with the first value of csvw:null
.
@paulmillar Don't worry about it ;) We will fix this. I just wanted to avoid that we both do the same work.
OK, thanks.
@paulmillar We pushed some new commits to development
on Github, feel free to check them out and let us know if your issue is resolved :)
Hi @DylanVanAssche ,
I've checked the development
branch and it works perfectly for me.
Thanks again!
Awesome! Will be available in the next release then :)
Motivation:
In tabular data representations, one problem is how to represent the absence of information; for example, if a field not apply to all rows what value should be placed in the cell where the information is either missing or not applicable?
A common solution is to use a place-holder value that represents the absence of information. Examples of such place-holder values include the empty string (""), a dash ("-") or a phrase (or abbreviation thereof) such as "N.A.".
It would be useful if such place-holder values were identified as such and RMLMapper refrained for making any corresponding assertions.
The CSVW namespace [1] provides metadata describing a CSV file. Within RMLMapper, this may be used to configure the CSV parser. One feature of CSVW is its ability to describe how certain values correspond to the
null
value. This is useful as RMLMapper will not include triples where the object value isnull
.Although RMLMapper provides partial support for CSVW, this currently lacks support for
csvw:null
assertions.[1] https://www.w3.org/ns/csvw
Modification:
Add limited support for
csvw:null
assertions.CSVW supports potentially multiple
csvw:null
assertions; however, Commons CSV parser only supports a singlenull
String (see [2]). Therefore, support forcsvw:null
is only partial.[2] https://issues.apache.org/jira/browse/CSV-293
Result:
A CSV-backed mapping may now be defined with a specific string identified as a place-holder indicating missing information. If a cell contains the place-holder value then any corresponding assertions are suppressed. This is achieved using the
csvw:null
assertion; however, please note that current support is limited to a singlecsvw:null
assertion; any subsequent assertions are silently ignored.