Rothamsted / knetbuilder

KnetBuilder data integration platform for building knowledge graphs. Previously known as ondex.
https://knetminer.com
MIT License
12 stars 11 forks source link

TAB Parser behaviour for blank entries in a column #31

Closed KeywanHP closed 2 years ago

KeywanHP commented 4 years ago

The below TAB parser snippet currently creates an "empty" concept even for blank entries in a given column. The desired behaviour would be to ignore blank entries.

    <concept id = "gene">
        <class>Gene</class>
        <data-source>ARAGWAS</data-source>
        <accession data-source="TAIR">
            <column index='1'/>
        </accession>
    </concept>
marco-brandizi commented 3 years ago

@KeywanHP, @jparsons2222, please someone test it with real data and close this if it works.

KeywanHP commented 3 years ago

This issue still exists. It is not related to a specific dataset but is a generic TAB parser enhancement request. For blank cells, the current logic seems to be to create a concept without any properties (ghost concept). The better logic would be to create concepts only for non-empty cells.

Do you agree @marco-brandizi @jparsons2222 ?

marco-brandizi commented 3 years ago

@KeywanHP, It should work as per the unit tests about it:

Sample file Config file test about missing values

As you can see,

Another test (testBlankLinesFilter()) has been in place for long time, to verify that completely empty rows are ignored.

Shouldn't it behave this way? Do you have an example + how you expect it to be parsed? Are you using the latest version?

EDIT: the mentioned unit test had a typo (in conceptFinder), however I fixed it and it is still working without detecting errors for the behaviours I've expected so far.

KeywanHP commented 3 years ago

This is great. Exactly the behaviour that I would be expecting. We have been using knetbuilder 3.0. Should switch to the latest version now! @jparsons2222

marco-brandizi commented 2 years ago

Closing as fixed.