capro-uio / nettskjemar

Package to work with UiO nettskjema
https://www.capro.dev/nettskjemar/
Other
2 stars 1 forks source link

changing metadata and access to data files #21

Open finn42 opened 3 years ago

finn42 commented 3 years ago

I'm trying to use this library to retrieve data stored as zip files by an app using nettskjema. nettskjema_get_data() is producing a tibble of metadata that is 1) switching the user name and data file names and 2) not retrieving the linked files. Is there a way to prevent this renaming pattern? And is there yet an option to download the datafiles with this library?

Screen Shot 2021-10-18 at 18 30 56
drmowinckels commented 3 years ago

Thanks for the bug report and feature request. There must be something wrong in the coding of attachments (which I am assuming this field really is). So I'll have a peak at whats going on there.

In terms of downloading the file, I have not made code for this, but its a perfectly reasonable request. I'll have a peak at the API documentation and see what I can come up with.

It wont be this week though, I as am busy with a conference.

drmowinckels commented 3 years ago

From just some small tests on my side, if you activate the codebook on nettskjema, and use use_codebook = TRUE this issue does not persist. I recommend, if you want to use this package, to have the codebook activated. it is hard to work with the data in R without it, I'm afraid. I'll work on the bug when I can, but looking at it now, it will require some thought how to fix it when there is no codebook.

finn42 commented 3 years ago

That sounds like a good solution for the naming issue at least. Thank you for looking into it!

drmowinckels commented 3 years ago

I'll have a look later today, I think I can squeeze in some time. But how did you activate encryption of the attachments? When I test on a test form I made, I have no option to encrypt them, and thus I also cannot download them through the API :/

finn42 commented 3 years ago

Hmm. I’m not sure that they are encrypted, and that might be an additional problem. The data are mobile phone accelerometer and GPS recordings collected via the MusicLab app, and I’m not on the app development team so the details are kind of sketchy. Are there encryption specs we should be satisfying for the nettskjema API?

drmowinckels commented 3 years ago

yes, I think there might be something either missing in the API documentation I have or there is no endpoint to retrieve the attachments through the API. I've sent an email to the nettskjema team at USIT and I'll keep you posted with what they say.

finn42 commented 3 years ago

I've managed to retrieve a submission attachment using a curl command in the form:

curl 'https://nettskjema.no/api/v2/submissions/5136794/attachments/113086' -i -X GET -H 'Authorization: Bearer TOKEN' -o 'filename'

Within a metadata wrapper, the file contents look to be base64 encoded, so retrievable, though awkward to crack open. To grab the data files, I'll have to call for the submission IDs then call each submission for its attachment ID and then call the attachment ID for the encrypted wrapped file, and then decrypt. If you ever get this workflow smoothed over in your library, please let me know!

drmowinckels commented 3 years ago

thanks! This is super helpful! In the API docs, only the "encrypted-attachments" endpoint is mentioned, which is where I was looking for the attachments. Having an endpoint now I think I can get something working.

drmowinckels commented 3 years ago

Hey!

I believe I have a set of functions for grabbing attachments now. I hope they are fairly well documented. Test them out by installing from the development branch on GitHub

# install.packages("remotes")
remotes::install_github("LCBC-UiO/nettskjemar", ref = "dev")

Easiest solution is to use:

nettskjema_get_form_attachments(form_id)
finn42 commented 3 years ago

Oh cool! I haven't tried it yet as I fell back on accessing the API through python's request library, but this will be helpful for my colleague that mostly works in r. If you are curious about the python code, I've shared a demo notebook here https://github.com/finn42/PullingNettskjema/blob/main/PullingNettskjema.ipynb

drmowinckels commented 3 years ago

base64 decoding was the little bit of code to figure out that made the magic happen. Super happy you flagged this for me to figure out, and also happy you found a way to do it in Python if that is your preferred language :)

The more single users that use the API the better we can advocate for changes in the API for us. It was developed with server to server communication in mind, so for them its really novel to have single users wanting access to the nettskjema in this way.

finn42 commented 3 years ago

Yeah I had a ticket open about the IP address restrictions, and have shared the same python jupyter notebook with them. Hopefully it will encourage them to consider how to make access a bit more practical. There was a lot of little details (like the base64 encoding) that could have been documented better.