ReproNim / reproschema-py

Apache License 2.0
2 stars 8 forks source link

[wip] updating reproschema commands to the new pydantic model #36

Closed djarecka closed 5 months ago

djarecka commented 7 months ago

@satra - for now I updated only validate, but can you please take a look if this is what you had in mind?

also, I wasn't sure where the context should be, for now I added to models

satra commented 7 months ago

looks like a reasonable approach. a few comments.

djarecka commented 7 months ago

ok, I understand that all the changes that we were doing to the pydantic model, including changes the type langString to Dict[str, str], should be done here in model.py.

djarecka commented 7 months ago

@satra - should we include in reproschema something like _ldmeta (from dandi schema) to save that a class is reproschema:Protocol, etc.? Something would be needed if we want to keep tests like this

satra commented 7 months ago

you already have that in the meaning/mapping/schema_uri components of the linkml schema. the generator does not add that to pydantic, and that may be good to add to the metadata. i thought you or puja was going to look into how to add other metadata to the generated code on the linkml side.

more specifically, those tests are really not helpful at the moment :)

djarecka commented 7 months ago

you already have that in the meaning/mapping/schema_uri components of the linkml schema. the generator does not add that to pydantic, and that may be good to add to the metadata. i thought you or puja was going to look into how to add other metadata to the generated code on the linkml side.

Yes, indeed, there is a current PR that should add linkml_metadata automatically

djarecka commented 6 months ago

could we also do the things we discussed earlier in the thread (#36 (comment))?

yes, I will work from the new redcap2reproschema scripts that @yibeichan updated

satra commented 6 months ago

@djarecka - do make sure there are round robin tests: load -> save -> compare

djarecka commented 6 months ago

@ibevers - check the test_shcema now. Yesterday it was a combination of load_file incorrectly treating my paths as url and that I provided wrong link to the reproschema context

There is at least one problem that I have to fix with writing down the jsonld from the pydantic object, if value of the response is integer in model_dump creates Decimal(0) what is not serializable

djarecka commented 6 months ago

@djarecka - do make sure there are round robin tests: load -> save -> compare

yes, I just had some issues and wanted to push it to share unfinished tests with Isaac. You can see some test now in test_schema.

Also, I've changed slightly the load_file since it was causing issues for my tests (not sure why wasn't causing issues earlier) and the logic was not clear to me. I think it tried to read my file as url. Now I just created if/else for either url or file, and use server for file. Hope I didn't miss any usecase that you wanted cover

djarecka commented 6 months ago

@yibeichan - when you have a chance you can check new redcap2reproschema. There are still some things that want to update (e.g. moving the creation of the directory structure for activity and items to the write_obj_jsonld as Satra suggested), but you can let me know if you have other suggestions

yibeichan commented 6 months ago

the result looks good! I didn't get errors this time

djarecka commented 6 months ago

@yibeichan - could you please check if new reproschema2redcap works for you?

djarecka commented 6 months ago

@yibeichan - so I fixed the way the activity_path is created in reproschema2redcap.py, but I also had to change the protocol in the tests

yibeichan commented 6 months ago

thank you! I just tested using the same b2ai data as last time, here is the error info, looks like something related to validation? I did git pull your latest change. did I miss something here

Found schema file: b2ai-redcap2rs/b2ai-redcap2rs/b2ai-redcap2rs_schema
Traceback (most recent call last):
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/bin/reproschema", line 33, in <module>
    sys.exit(load_entry_point('reproschema', 'console_scripts', 'reproschema')())
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/cli.py", line 127, in reproschema2redcap
    rs2redcap(input_path_obj, output_csv_path)
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 259, in main
    csv_data = get_csv_data(input_dir_path, contextfile, http_kwargs)
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 158, in get_csv_data
    prot = Protocol(**parsed_protocol_json)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/pydantic/main.py", line 176, in __init__
    self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 3 validation errors for Protocol
reproschema:category
  Extra inputs are not permitted [type=extra_forbidden, input_value='reproschema:Protocol', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
reproschema:id
  Extra inputs are not permitted [type=extra_forbidden, input_value='b2ai-redcap2rs_schema', input_type=str]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
reproschema:order
  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'id': '../activities/su...ers_adhd_adult_schema'}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
djarecka commented 6 months ago

@yibeichan - I fixed some issues with context, but you have to create a new b2ai-redcap2rs with redcap2reproschema since context has changed.

Also, I tried to do the test to compare csv aftter running redcap2reproschema and reproschema2redcap to the original one and I see some of the columns are empty. Perhaps some of them are left as not important, but perhaps other we should find a way to add it? One field I removed during the changes was requiredValue (row_data["required"] = "" # response_options.get("requiredValue", "")) since it was never a field from ResponseOption, but should be taken from the activity schema

yibeichan commented 6 months ago

Hi, I created a new b2ai-redcap2rs, and this time, it worked!

I noticed that some columns are empty. For example, branch logic, I wasn't sure whether b2ai has branch logic, so I tested it on nimh-minimal. I git clone our nimh-minimal repo and run reproschema2redcap and got the following errors:

reproschema reproschema2redcap nimh-minimal nimh-minimal.csv
Found schema file: None
Found schema file: nimh-minimal/nimh_minimal/nimh_minimal_schema
Traceback (most recent call last):
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/bin/reproschema", line 33, in <module>
    sys.exit(load_entry_point('reproschema', 'console_scripts', 'reproschema')())
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/cli.py", line 127, in reproschema2redcap
    rs2redcap(input_path_obj, output_csv_path)
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 260, in main
    csv_data = get_csv_data(input_dir_path, contextfile, http_kwargs)
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 179, in get_csv_data
    item_json = load_file(
  File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/jsonldutils.py", line 94, in load_file
    raise Exception(f"{path_or_url} is not a valid URL or file path")
Exception: nimh-minimal/nimh_minimal/../activities/demo/https:/raw.githubusercontent.com/ReproNim/reproschema-library/80867e36fb2c00563290486bf3f3bbeb3198f5cb/activities/NDA/items/interview_age is not a valid URL or file path

In nimh-minimal, we cited items from reproschema-library, see this one; this might need extra work for parsing.

yibeichan commented 6 months ago

hello hello, I was using redcap2reproschema to convert HBCD redcap csv to reproschema and got the following errors, it looks like something related to input type = list. sent this csv to you on slack

Error: Error during conversion: 2 validation errors for Item
ui.hidden
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
scoringLogic
  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'variableName': 'sed_bm...bm_demo_roster_002 ))"}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
djarecka commented 6 months ago

hello hello, I was using redcap2reproschema to convert HBCD redcap csv to reproschema and got the following errors, it looks like something related to input type = list. sent this csv to you on slack

Error: Error during conversion: 2 validation errors for Item
ui.hidden
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
scoringLogic
  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'variableName': 'sed_bm...bm_demo_roster_002 ))"}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden

These fields are simply not parta of the reproschema model, but it is added in redcap2reproschema. So this is not related to pydantic version, but the logic in the code is outdated and should be updated. Perhaps scoringLogic is compute?

djarecka commented 5 months ago

In nimh-minimal, we cited items from reproschema-library, see this one; this might need extra work for parsing.

I fixed the code that it works for urls, however in on place ResponseOption is provided as url to another jsonld. This is not supported, if we want yo have it we would have to change the code. And it's not clear to me if it the model should be changed to have url as an option to ResponseOption? (cc:@satra)

satra commented 5 months ago

responseOptions can be a url (see: https://github.com/ReproNim/reproschema-library/blob/enh/rc4/activities/GAD7/items/gad7_1#L38) and that pattern is used in many schemas.

yibeichan commented 5 months ago

ah, I was thinking about this line here, we cited "demo": "https://raw.githubusercontent.com/ReproNim/reproschema-library/80867e36fb2c00563290486bf3f3bbeb3198f5cb/activities/NDA/items/" I thought this was the one that caused the error

yibeichan commented 5 months ago

wait, I saw Dotota updated a new commit regarding the context URL, I'll check later today.

satra commented 5 months ago

ah, I was thinking about this line here, we

that line is also valid and should be respected in any loader.

yibeichan commented 5 months ago

These fields are simply not parta of the reproschema model, but it is added in redcap2reproschema. So this is not related to pydantic version, but the logic in the code is outdated and should be updated. Perhaps scoringLogic is compute?

yes, you mean this line https://github.com/djarecka/reproschema-py/blob/86f3699fdda55802478208b5a376dde0467c1d3b/reproschema/redcap2reproschema.py#L186, right? it should be compute if compute is part of the reproschema model

djarecka commented 5 months ago

responseOptions can be a url (see: https://github.com/ReproNim/reproschema-library/blob/enh/rc4/activities/GAD7/items/gad7_1#L38) and that pattern is used in many schemas.

Should I change the loader that it puts the content of the additional url to data? what are the other fields that should be expanded?

Perhaps it should be done in new PR? I don't think it was supported in the current version.

Or should I just fix the reproschema2redcap so it load the additional file?

djarecka commented 5 months ago

These fields are simply not parta of the reproschema model, but it is added in redcap2reproschema. So this is not related to pydantic version, but the logic in the code is outdated and should be updated. Perhaps scoringLogic is compute?

yes, you mean this line https://github.com/djarecka/reproschema-py/blob/86f3699fdda55802478208b5a376dde0467c1d3b/reproschema/redcap2reproschema.py#L186, right? it should be compute if compute is part of the reproschema model

Yes it does have compute, see here, and it was not changed by me, compute has been introduced much earlier.

But, I'm not sure about hidden

djarecka commented 5 months ago

@yibei - converting nimh-minimal should work now.

as for the scoringLogic perhaps you can help me tomorrow, the code doesn't look good to me, and I'm pretty sure that it doesn't do what was meant to, so we could try to fix it together.

djarecka commented 5 months ago

@yibeichan - please check how this work for your examples, as I mentioned I believe there were multiple parts that were not working as intended that I tried to fix, hopefully didn't break something else.

A few points that wasn't sure:

@satra , a few questions to you, there are some parts of the codes that are commented or not used that I'm not really sure how I would fit to the current schema. Should I completely removed them or we are missing something in the schema (in that case I would open an issue to not forget). I'm talking about:

satra commented 5 months ago

isVis should be based on branching logic if it says show. i believe we were using calc to say a field should not be shown. so if both are true for a field, we need to ensure that both calc is there in compute and isVis is dependent on branching logic. we should see a few examples where this is true.

matrix_group_name/count should stay (we may use that later in the ui component). i don't know what identifiable does. would need to see an example (perhaps it's in the abcd schema or in the umass schema).

djarecka commented 5 months ago

isVis should be based on branching logic if it says show. i believe we were using calc to say a field should not be shown. so if both are true for a field, we need to ensure that both calc is there in compute and isVis is dependent on branching logic. we should see a few examples where this is true.

and what about when is "calc", i.e. compute in Reproschema. Previously the code was using hidden, but we thought that it would be now isVis in the Activity schema.

matrix_group_name/count should stay (we may use that later in the ui component). i don't know what identifiable does. would need to see an example (perhaps it's in the abcd schema or in the umass schema).

ok, will keep for now

yibeichan commented 5 months ago

here are some examples where both calc and branching logicexist. I think we can still keep these items hidden but adding branching logic to the activity schema where those items are. basically, when bothcalcand branching logic exist, it means we do different calculation based on different conditions, but they are still calculations. image

for Text validation Min, here are some examples, can we use Text Validation Type to help us set them up in reproschema? e.g., we set up valueType along with minValue and maxValue? image

satra commented 5 months ago

and what about when is "calc", i.e. compute in Reproschema.

phq9 total score is a calculated item, so that activity schema is a good example for what to do with a calc only item and isVis.

yibeichan commented 5 months ago

also, i guess because of what we mentioned above, I still got the same errors when converting HBCD

ui.hidden
  Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
scoringLogic
  Extra inputs are not permitted [type=extra_forbidden, input_value=[{'variableName': 'sed_bm...bm_demo_roster_002 ))"}], input_type=list]
    For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
djarecka commented 5 months ago

@yibeichan - did you pull the changes?

yibeichan commented 5 months ago

I did. image

djarecka commented 5 months ago

@yibeichan - perhaps you are using wrong environment? Do you use the editable install? Can you double check. In the new version there is really no "scoringLogic", see here

If you still have issues, ping me on slack, and we can quickly meet

yibeichan commented 5 months ago

my bad, i didn't install the editable one. just reinstalled. it's working now!

djarecka commented 5 months ago

my bad, i didn't install the editable one. just reinstalled. it's working now!

but please look at the output

yibeichan commented 5 months ago

yes, it worked, no errors. and i checked some of those with calc and branching logic, looks good.

yibeichan commented 5 months ago

wait a second, everything is good, but can we make the schema keys in order? like put @context at the beginning. everything else looks in order. and our current context is reproschema/ref/linkml/contexts/reproschema I guess we'll need to change it reproschema/master/contexts/reproschema once we merge this PR

{
    "id": "sed_bm_demo_herit_002_i_01",
    "category": "reproschema:Item",
    "description": {
        "en": "@NONEOFTHEABOVE='777,999'"
    },
    "prefLabel": {
        "en": "sed_bm_demo_herit_002_i_01"
    },
    "question": {
        "en": "American Indian or Alaska Native"
    },
    "responseOptions": {
        "choices": null,
        "valueType": "xsd:string"
    },
    "ui": {
        "inputType": "radio"
    },
    "@context": "https://raw.githubusercontent.com/ReproNim/reproschema/ref/linkml/contexts/reproschema"
}
djarecka commented 5 months ago

wait a second, everything is good, but can we make the schema keys in order? like put @context at the beginning. everything else looks in order.

this is dictionary, so really doesn't have an order, but we can create perhaps some order dictionary in write_obj function. I will come back to this PR later.

and our current context is reproschema/ref/linkml/contexts/reproschema I guess we'll need to change it reproschema/master/contexts/reproschema once we merge this PR

yes, correct, this will have to be changed

djarecka commented 5 months ago

Tex Validation Min is mapped to valueMin here, but this is sometimes text, e.g. [screening_arm_1][setup_lmp]. What should be the logic here?

@yibeichan - can you help me with this?

yibeichan commented 5 months ago

for Text validation Min, here are some examples, can we use Text Validation Type to help us set them up in reproschema? e.g., we set up valueType along with minValue and maxValue? image

@djarecka would this be helpful?

djarecka commented 5 months ago

@yibeichan - can you check if the fields with compute makes sense. I checked the phq9 as Satra suggested and there is indeed no isVis there

yibeichan commented 5 months ago

@djarecka yes, isVis should be False for compute items.

but the newly generated compute items are not correct. there are many repetitive items, you can see a lot of pex_bm_apa1_flag01 here. This is from pex_bm_apa_schema (I can't attach the file here but sent to your slack. this is generated from the HBCD csv I shared with you on slack last time)

{
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
        },
        {
            "jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
            "variableName": "pex_bm_apa1_flag01"
djarecka commented 5 months ago

@yibeichan - ok, will check the repetition.

btw., I believe I also found some error with inputType in the code, and I started checking more, and I realized that in one of the csv you gave me there are many Field Types. DO you know what we should do with checkbox or file?

yibeichan commented 5 months ago

@djarecka for checkbox we can use multipleChoice for file, I found it in bridge2ai csv, where it requires pdf. we don't have equivalent fields in reproschema (we have audioOject, videoObject, image, contentUrl), maybe we can use something in schema.org, I can think about [DigitalDocument](https://schema.org/DigitalDocument) or [Thing](https://schema.org/Thing). what do you think? @satra

djarecka commented 5 months ago

@yibeichan - just to be clear, we are talking about inputType, and this is the list I'm using as a reference what is supported by the ui. I guess checkbox can be select, but don't see file.

Also not sure what are other options that you might have in other redcap cvs. I will add perhaps some exceptions, because now there are ignored

satra commented 5 months ago

the demo protocol has a file upload item. perhaps check that out.

yibeichan commented 5 months ago

@djarecka it's documentUpload https://github.com/ReproNim/demo-protocol/blob/16a54ef5eeaef7282d9ed0410cc22ff1bfef6e71/activities/Activity1/items/document_upload_item#L11

we have it in UI here: https://github.com/ReproNim/reproschema-ui/blob/05096f07b3c27e82ee9382849b073c0896448a47/src/components/Inputs/DocumentUpload/DocumentUpload.vue#L2