Closed djarecka closed 5 months ago
looks like a reasonable approach. a few comments.
model_autogen.py
and then have a model.py that simply imports and adjusts (since pydantic 2.0 has an injection mode for modification) any of the autogen classes. validate
can then import from model
as it does.ok, I understand that all the changes that we were doing to the pydantic model, including changes the type langString
to Dict[str, str]
, should be done here in model.py
.
@satra - should we include in reproschema something like _ldmeta
(from dandi schema) to save that a class is reproschema:Protocol
, etc.?
Something would be needed if we want to keep tests like this
you already have that in the meaning/mapping/schema_uri components of the linkml schema. the generator does not add that to pydantic, and that may be good to add to the metadata. i thought you or puja was going to look into how to add other metadata to the generated code on the linkml side.
more specifically, those tests are really not helpful at the moment :)
you already have that in the meaning/mapping/schema_uri components of the linkml schema. the generator does not add that to pydantic, and that may be good to add to the metadata. i thought you or puja was going to look into how to add other metadata to the generated code on the linkml side.
Yes, indeed, there is a current PR that should add linkml_metadata
automatically
could we also do the things we discussed earlier in the thread (#36 (comment))?
yes, I will work from the new redcap2reproschema scripts that @yibeichan updated
@djarecka - do make sure there are round robin tests: load -> save -> compare
@ibevers - check the test_shcema
now. Yesterday it was a combination of load_file
incorrectly treating my paths as url and that I provided wrong link to the reproschema context
There is at least one problem that I have to fix with writing down the jsonld from the pydantic object, if value
of the response is integer in model_dump
creates Decimal(0)
what is not serializable
@djarecka - do make sure there are round robin tests: load -> save -> compare
yes, I just had some issues and wanted to push it to share unfinished tests with Isaac. You can see some test now in test_schema
.
Also, I've changed slightly the load_file
since it was causing issues for my tests (not sure why wasn't causing issues earlier) and the logic was not clear to me. I think it tried to read my file as url. Now I just created if/else for either url or file, and use server for file. Hope I didn't miss any usecase that you wanted cover
@yibeichan - when you have a chance you can check new redcap2reproschema
. There are still some things that want to update (e.g. moving the creation of the directory structure for activity and items to the write_obj_jsonld
as Satra suggested), but you can let me know if you have other suggestions
the result looks good! I didn't get errors this time
@yibeichan - could you please check if new reproschema2redcap works for you?
@yibeichan - so I fixed the way the activity_path is created in reproschema2redcap.py
, but I also had to change the protocol in the tests
thank you! I just tested using the same b2ai data as last time, here is the error info, looks like something related to validation? I did git pull
your latest change. did I miss something here
Found schema file: b2ai-redcap2rs/b2ai-redcap2rs/b2ai-redcap2rs_schema
Traceback (most recent call last):
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/bin/reproschema", line 33, in <module>
sys.exit(load_entry_point('reproschema', 'console_scripts', 'reproschema')())
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/cli.py", line 127, in reproschema2redcap
rs2redcap(input_path_obj, output_csv_path)
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 259, in main
csv_data = get_csv_data(input_dir_path, contextfile, http_kwargs)
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 158, in get_csv_data
prot = Protocol(**parsed_protocol_json)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/pydantic/main.py", line 176, in __init__
self.__pydantic_validator__.validate_python(data, self_instance=self)
pydantic_core._pydantic_core.ValidationError: 3 validation errors for Protocol
reproschema:category
Extra inputs are not permitted [type=extra_forbidden, input_value='reproschema:Protocol', input_type=str]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
reproschema:id
Extra inputs are not permitted [type=extra_forbidden, input_value='b2ai-redcap2rs_schema', input_type=str]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
reproschema:order
Extra inputs are not permitted [type=extra_forbidden, input_value=[{'id': '../activities/su...ers_adhd_adult_schema'}], input_type=list]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
@yibeichan - I fixed some issues with context, but you have to create a new b2ai-redcap2rs
with redcap2reproschema
since context has changed.
Also, I tried to do the test to compare csv aftter running redcap2reproschema
and reproschema2redcap
to the original one and I see some of the columns are empty. Perhaps some of them are left as not important, but perhaps other we should find a way to add it? One field I removed during the changes was requiredValue
(row_data["required"] = "" # response_options.get("requiredValue", "")
) since it was never a field from ResponseOption
, but should be taken from the activity schema
Hi, I created a new b2ai-redcap2rs
, and this time, it worked!
I noticed that some columns are empty. For example, branch logic, I wasn't sure whether b2ai has branch logic, so I tested it on nimh-minimal. I git clone our nimh-minimal repo and run reproschema2redcap
and got the following errors:
reproschema reproschema2redcap nimh-minimal nimh-minimal.csv
Found schema file: None
Found schema file: nimh-minimal/nimh_minimal/nimh_minimal_schema
Traceback (most recent call last):
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/bin/reproschema", line 33, in <module>
sys.exit(load_entry_point('reproschema', 'console_scripts', 'reproschema')())
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/yibeichen/miniconda3/envs/tes-reproschema/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/cli.py", line 127, in reproschema2redcap
rs2redcap(input_path_obj, output_csv_path)
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 260, in main
csv_data = get_csv_data(input_dir_path, contextfile, http_kwargs)
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/reproschema2redcap.py", line 179, in get_csv_data
item_json = load_file(
File "/Users/yibeichen/Desktop/test/reproschema-py/reproschema/jsonldutils.py", line 94, in load_file
raise Exception(f"{path_or_url} is not a valid URL or file path")
Exception: nimh-minimal/nimh_minimal/../activities/demo/https:/raw.githubusercontent.com/ReproNim/reproschema-library/80867e36fb2c00563290486bf3f3bbeb3198f5cb/activities/NDA/items/interview_age is not a valid URL or file path
In nimh-minimal, we cited items from reproschema-library, see this one; this might need extra work for parsing.
hello hello, I was using redcap2reproschema
to convert HBCD redcap csv to reproschema and got the following errors, it looks like something related to input type = list. sent this csv to you on slack
Error: Error during conversion: 2 validation errors for Item
ui.hidden
Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
scoringLogic
Extra inputs are not permitted [type=extra_forbidden, input_value=[{'variableName': 'sed_bm...bm_demo_roster_002 ))"}], input_type=list]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
hello hello, I was using
redcap2reproschema
to convert HBCD redcap csv to reproschema and got the following errors, it looks like something related to input type = list. sent this csv to you on slackError: Error during conversion: 2 validation errors for Item ui.hidden Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool] For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden scoringLogic Extra inputs are not permitted [type=extra_forbidden, input_value=[{'variableName': 'sed_bm...bm_demo_roster_002 ))"}], input_type=list] For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
These fields are simply not parta of the reproschema model, but it is added in redcap2reproschema
. So this is not related to pydantic version, but the logic in the code is outdated and should be updated. Perhaps scoringLogic
is compute
?
In nimh-minimal, we cited items from reproschema-library, see this one; this might need extra work for parsing.
I fixed the code that it works for urls, however in on place ResponseOption
is provided as url to another jsonld. This is not supported, if we want yo have it we would have to change the code. And it's not clear to me if it the model should be changed to have url
as an option to ResponseOption
? (cc:@satra)
responseOptions
can be a url (see: https://github.com/ReproNim/reproschema-library/blob/enh/rc4/activities/GAD7/items/gad7_1#L38) and that pattern is used in many schemas.
ah, I was thinking about this line here, we cited "demo": "https://raw.githubusercontent.com/ReproNim/reproschema-library/80867e36fb2c00563290486bf3f3bbeb3198f5cb/activities/NDA/items/"
I thought this was the one that caused the error
wait, I saw Dotota updated a new commit regarding the context URL, I'll check later today.
ah, I was thinking about this line here, we
that line is also valid and should be respected in any loader.
These fields are simply not parta of the reproschema model, but it is added in
redcap2reproschema
. So this is not related to pydantic version, but the logic in the code is outdated and should be updated. PerhapsscoringLogic
iscompute
?
yes, you mean this line https://github.com/djarecka/reproschema-py/blob/86f3699fdda55802478208b5a376dde0467c1d3b/reproschema/redcap2reproschema.py#L186, right? it should be compute
if compute
is part of the reproschema model
responseOptions
can be a url (see: https://github.com/ReproNim/reproschema-library/blob/enh/rc4/activities/GAD7/items/gad7_1#L38) and that pattern is used in many schemas.
Should I change the loader that it puts the content of the additional url to data? what are the other fields that should be expanded?
Perhaps it should be done in new PR? I don't think it was supported in the current version.
Or should I just fix the reproschema2redcap so it load the additional file?
These fields are simply not parta of the reproschema model, but it is added in
redcap2reproschema
. So this is not related to pydantic version, but the logic in the code is outdated and should be updated. PerhapsscoringLogic
iscompute
?yes, you mean this line https://github.com/djarecka/reproschema-py/blob/86f3699fdda55802478208b5a376dde0467c1d3b/reproschema/redcap2reproschema.py#L186, right? it should be
compute
ifcompute
is part of the reproschema model
Yes it does have compute
, see here, and it was not changed by me, compute
has been introduced much earlier.
But, I'm not sure about hidden
@yibei - converting nimh-minimal
should work now.
as for the scoringLogic
perhaps you can help me tomorrow, the code doesn't look good to me, and I'm pretty sure that it doesn't do what was meant to, so we could try to fix it together.
@yibeichan - please check how this work for your examples, as I mentioned I believe there were multiple parts that were not working as intended that I tried to fix, hopefully didn't break something else.
A few points that wasn't sure:
"Branching Logic (Show field only if...)")
and "Field Type"] == "calc"
. Which should have higher priority when deciding on isVis
?
- Tex Validation Min
is mapped to valueMin
here, but this is sometimes text, e.g. [screening_arm_1][setup_lmp]
. What should be the logic here?@satra , a few questions to you, there are some parts of the codes that are commented or not used that I'm not really sure how I would fit to the current schema. Should I completely removed them or we are missing something in the schema (in that case I would open an issue to not forget). I'm talking about:
isVis
should be based on branching logic if it says show. i believe we were using calc
to say a field should not be shown. so if both are true for a field, we need to ensure that both calc is there in compute and isVis is dependent on branching logic. we should see a few examples where this is true.
matrix_group_name/count
should stay (we may use that later in the ui
component). i don't know what identifiable
does. would need to see an example (perhaps it's in the abcd schema or in the umass schema).
isVis
should be based on branching logic if it says show. i believe we were usingcalc
to say a field should not be shown. so if both are true for a field, we need to ensure that both calc is there in compute and isVis is dependent on branching logic. we should see a few examples where this is true.
and what about when is "calc", i.e. compute
in Reproschema. Previously the code was using hidden
, but we thought that it would be now isVis
in the Activity
schema.
matrix_group_name/count
should stay (we may use that later in theui
component). i don't know whatidentifiable
does. would need to see an example (perhaps it's in the abcd schema or in the umass schema).
ok, will keep for now
here are some examples where both calc
and branching logicexist. I think we can still keep these items hidden but adding branching logic to the activity schema where those items are. basically, when both
calcand branching logic
exist, it means we do different calculation based on different conditions, but they are still calculations.
for Text validation Min
, here are some examples, can we use Text Validation Type
to help us set them up in reproschema? e.g., we set up valueType
along with minValue
and maxValue
?
and what about when is "calc", i.e.
compute
in Reproschema.
phq9 total score is a calculated item, so that activity schema is a good example for what to do with a calc only item and isVis
.
also, i guess because of what we mentioned above, I still got the same errors when converting HBCD
ui.hidden
Extra inputs are not permitted [type=extra_forbidden, input_value=True, input_type=bool]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
scoringLogic
Extra inputs are not permitted [type=extra_forbidden, input_value=[{'variableName': 'sed_bm...bm_demo_roster_002 ))"}], input_type=list]
For further information visit https://errors.pydantic.dev/2.7/v/extra_forbidden
@yibeichan - did you pull the changes?
I did.
@yibeichan - perhaps you are using wrong environment? Do you use the editable install? Can you double check. In the new version there is really no "scoringLogic", see here
If you still have issues, ping me on slack, and we can quickly meet
my bad, i didn't install the editable one. just reinstalled. it's working now!
my bad, i didn't install the editable one. just reinstalled. it's working now!
but please look at the output
yes, it worked, no errors. and i checked some of those with calc
and branching logic, looks good.
wait a second, everything is good, but can we make the schema keys in order? like put @context
at the beginning. everything else looks in order.
and our current context is reproschema/ref/linkml/contexts/reproschema
I guess we'll need to change it reproschema/master/contexts/reproschema
once we merge this PR
{
"id": "sed_bm_demo_herit_002_i_01",
"category": "reproschema:Item",
"description": {
"en": "@NONEOFTHEABOVE='777,999'"
},
"prefLabel": {
"en": "sed_bm_demo_herit_002_i_01"
},
"question": {
"en": "American Indian or Alaska Native"
},
"responseOptions": {
"choices": null,
"valueType": "xsd:string"
},
"ui": {
"inputType": "radio"
},
"@context": "https://raw.githubusercontent.com/ReproNim/reproschema/ref/linkml/contexts/reproschema"
}
wait a second, everything is good, but can we make the schema keys in order? like put
@context
at the beginning. everything else looks in order.
this is dictionary, so really doesn't have an order, but we can create perhaps some order dictionary in write_obj
function. I will come back to this PR later.
and our current context is
reproschema/ref/linkml/contexts/reproschema
I guess we'll need to change itreproschema/master/contexts/reproschema
once we merge this PR
yes, correct, this will have to be changed
-
Tex Validation Min
is mapped tovalueMin
here, but this is sometimes text, e.g.[screening_arm_1][setup_lmp]
. What should be the logic here?
@yibeichan - can you help me with this?
for
Text validation Min
, here are some examples, can we useText Validation Type
to help us set them up in reproschema? e.g., we set upvalueType
along withminValue
andmaxValue
?
@djarecka would this be helpful?
@yibeichan - can you check if the fields with compute makes sense. I checked the phq9 as Satra suggested and there is indeed no isVis there
@djarecka yes, isVis
should be False
for compute items.
but the newly generated compute items are not correct. there are many repetitive items, you can see a lot of pex_bm_apa1_flag01
here. This is from pex_bm_apa_schema
(I can't attach the file here but sent to your slack. this is generated from the HBCD csv I shared with you on slack last time)
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
},
{
"jsExpression": "if(( pex_bm_apa1_suic_001 > 0 AND pex_bm_apa1_suic_001 < 777), 1, 0)",
"variableName": "pex_bm_apa1_flag01"
@yibeichan - ok, will check the repetition.
btw., I believe I also found some error with inputType
in the code, and I started checking more, and I realized that in one of the csv you gave me there are many Field Types
. DO you know what we should do with checkbox
or file
?
@djarecka for checkbox
we can use multipleChoice
for file
, I found it in bridge2ai
csv, where it requires pdf. we don't have equivalent fields in reproschema (we have audioOject, videoObject, image, contentUrl), maybe we can use something in schema.org, I can think about [DigitalDocument](https://schema.org/DigitalDocument)
or [Thing](https://schema.org/Thing)
. what do you think? @satra
@yibeichan - just to be clear, we are talking about inputType
, and this is the list I'm using as a reference what is supported by the ui.
I guess checkbox
can be select
, but don't see file
.
Also not sure what are other options that you might have in other redcap cvs. I will add perhaps some exceptions, because now there are ignored
the demo protocol has a file upload item. perhaps check that out.
@djarecka it's documentUpload
https://github.com/ReproNim/demo-protocol/blob/16a54ef5eeaef7282d9ed0410cc22ff1bfef6e71/activities/Activity1/items/document_upload_item#L11
we have it in UI here: https://github.com/ReproNim/reproschema-ui/blob/05096f07b3c27e82ee9382849b073c0896448a47/src/components/Inputs/DocumentUpload/DocumentUpload.vue#L2
@satra - for now I updated only
validate
, but can you please take a look if this is what you had in mind?also, I wasn't sure where the context should be, for now I added to
models