airr-community / airr-standards

AIRR Community Data Standards
https://docs.airr-community.org
Creative Commons Attribution 4.0 International
35 stars 23 forks source link

Specify types for custom columns in Rearrangements TSV? #433

Open scharch opened 4 years ago

scharch commented 4 years ago

Ideally, I would be able to do this with a simple dictionary like { 'columnX':int, 'columnY':boolean }, but I could settle for passing in a full auxiliary schema if that's considered the proper way to go about it.

javh commented 4 years ago

To you mean as arguments to read/write/validate in the python library?

scharch commented 4 years ago

Yeah, so I can do something like if row['columnX'] > 0. Obviously I can just cast it explicitly, but it seems like it might be a useful feature generally.

schristley commented 3 years ago

Is the field type sufficient, or would we see a circumstance where other attributes might be desirable? nullable is one example. If we don't really know but want to allow for the possibility in the future, we could just alter that simple dictionary slightly like this. Then we could support additional properties easily.

{ 'columnX': { type: int }, 'columnY': { type: boolean }}
scharch commented 3 years ago

Yeah, I could see how that could be desirable. Flexible is good, anyway, and it doesn't seem to cost anything in this case.

bcorrie commented 3 years ago

This would be a nice extension - to make sure I understand, this is for the validate libraries, correct? So you can say to the python validate functions, here is an AIRR TSV file with custom columns, and here is the schema for those custom columns, please validate the file against the AIRR schema and the custom schema... Correct?

schristley commented 3 years ago

Yes, but also the read/write functions so it does automatic type conversion.

imkeller commented 3 years ago

Could someone please share an example file for which there are custom columns or columns where the type cannot be guessed? I didn't find any in the test data. The R package performs as expected for good_data.tsv and bad_data.tsv. What is the role of extra_data.tsv? There is a parsing error, but I think it does not reflect the column type problem.

scharch commented 1 year ago

Leaving this open as the functionality has only been implemented in R, not Python

bcorrie commented 6 months ago

@scharch still v2.0 issue?

scharch commented 6 months ago

I mean, it would be nice for the python and R functionalities to match...

javh commented 6 months ago

This was implemented via the aux_types argument to read_trabular.

scharch commented 6 months ago

@javh I believe this should be reopened as there is not yet a corresponding functionality in the python RearrangmentReader

javh commented 6 months ago

Okiee. Reopened. I missed that this was a python library issue and not an R library issue (for no good reason, just not paying attention).