IMCR-Hackathon / datapie

Data Package Interface for Evaluation ("Easy as pie!")
https://imcr-hackathon.github.io/datapie/
MIT License
3 stars 2 forks source link

use_missing_code() logic #61

Closed clnsmth closed 5 years ago

clnsmth commented 5 years ago

Hi @atn38, please test and revise use_missing_code() when you get a chance. The logic doesn't seem quite right and was resulting in data_package_read() errors. You'll have to uncomment this block of code in data_package_read() when it's working again. Thanks!

atn38 commented 5 years ago

Thanks Colin for alerting the issue. What's the data_package_read errors? I'd appreciate the DOI too.

On Mon, Jul 29, 2019 at 3:21 PM Colin Smith notifications@github.com wrote:

Assigned #61 https://github.com/IMCR-Hackathon/datapie/issues/61 to @atn38 https://github.com/atn38.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/IMCR-Hackathon/datapie/issues/61?email_source=notifications&email_token=AKAZD5UDYGQGGB2NL7QG3HTQB5GLRA5CNFSM4IHV4GRKYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOSYIR22I#event-2517704041, or mute the thread https://github.com/notifications/unsubscribe-auth/AKAZD5R4DOM5JWYGIFGRP5DQB5GLRANCNFSM4IHV4GRA .

clnsmth commented 5 years ago

Sure thing @atn38.

Logic at line 16 of use_missing_code() is configured such that the code within this if statement will never run. Specifically, use of in rather than %in%. These errors only occur for data packages containing attribute_metadata objects with missingValueCode (i.e. the outer if logic is working correctly.

doi:10.6073/pasta/c964ed49ff284dfcaaf53719651da60f (works) doi:10.18739/A2DP3X (errors)

Error:

Error in matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn,  : 
  length of 'dimnames' [2] not equal to array extent
Called from: matrix(if (is.null(value)) logical() else value, nrow = nr, dimnames = list(rn, 
    cn))

While a lot of datapie's functions aren't readily testable, some are. Seems like this one could be. Here is a great resource on writing unit tests for R. I'll bring up the need for unit testing in our Thursday meeting.

atn38 commented 5 years ago

@clnsmth, thanks for the suggestion. I followed it but didn't seem to work. Then I found that there is a mismatch in the second data table in doi:10.18739/A2DP3X between the attributeName listed in metadata versus the column names in the data. This was the source of the error. I'll rewrite use_missing_code to not rely on names in the two places always matching up. Probably will include fuzzy matching and/or order matching of some sort.

clnsmth commented 5 years ago

Hi @atn38, the above example will not reproduce the error on the development branch because I commented out the call to use_missing_code() in the data_package_read() function (see https://github.com/IMCR-Hackathon/datapie/issues/61#issue-474238099).

Yes, incongruence between data and metadata column names makes programmatic workflows challenging! This is a prime example of the role quality metadata provides to data reuse!