dputhier / libgtftk

gtftk C Library and program
GNU General Public License v3.0
3 stars 2 forks source link

Question regarding select_by_key #71

Closed dputhier closed 6 years ago

dputhier commented 6 years ago

Hi, I was wondering what was the behaviour of libgtftk.select_by_key when invert_match was set to 2. I guess that it implies that the function will return the record iff the key exists for that record. If it exists but its value is set to ".", the record will be returned (confirm plz). This is what is expected. I should implement it in the same way in select_by_regexp. However, I have no way to know whether a key exists for a record. The only thing I can do is extract a column (using extract_data) and check whether it contains a "." which does not tell me whether the key exists for a record or not... This argue for a particular encoding of both information when using extract_data. We should be able to distinguish :

On my side I would like to be able to implement two different args on select_by_reg_exp, select_by_reg_key, and extract_data: no_na (controls whether I want any "." in the output) and if_key_exists (perform the test only if the key exists). This would requires several modifications on the Python side. I don't know about C side. Tells me. We can discuss about it next week.

dputhier commented 6 years ago

Trying tout be a little bit more explicit. My wish would be an additional argument in libgtftk.extract_data (e.g 'explicit') to be more explicit regarding ".". If explicit is set to true (default false for backward compatibility) then records for which the key does not exists would turn to something like "??" Or "^$". I guess these kind if value should be rare...

fafa13 commented 6 years ago

I did a very small change in extract_data to distinguish missing attributes from "." values. Now, if an attribute is missing, its value is "?" in the results of extract_data. I think that the "explicit" parameter can be implemented in the Python side. It's probably easier than in C ...

dputhier commented 6 years ago

It had also some level of complexity in my Python code as extract_data returns many types of object (dict, list, list of list...). But I will go Python for this anyway.

dputhier commented 6 years ago

This feature is available starting fromage 0.9.0 release. Records for which the key does notre exists return "?".