Closed tymees closed 2 years ago
My suggestion would be to look for a column named 'subject', all our templates (should) add this field. Maybe even make it configurable during experiment registration in the app?
As for the two requirements: it should not be hard to wrap everything in a try/except block; as for the fallback, my suggestion would be to use the old method as the fallback.
We could possibly add a subject id to the new session API, and use that in the filename when it's available
Hm, that could work. However, I'd also add it to the existing upload endpoints as an optional parameter. That way you have two workflows:
Another way to solve our problem might be to add an optional 'filename' parameter to the upload API; that way someone can upload multiple sets of data and name them in a way the researcher knows what the different files are.
We've decided to do the following:
To do this, we need to do the following:
DataPoint
model, which will store that piece of metadataaccess_key
is alreadyUploadView.post
with a default value (I would use None). The default value is needed as otherwise Django will error when using the URL path without the new variableRight now the datastore already assigns a participant ID automatically (in UUID format). Thus we have multiple options to aproach this:
subject_id
parameter alongside the participant_id
.participant ID
field to be changed to a regular text fieldOption 1 is not ideal imo, as that would result in two fields that are similarly named and fulfilling similar goals.
As for option 2 vs 3, it's more of a question on who should be responsible for determining the subject ID. You could make the case that it's better to have the datastore be authorative, as it knows more about the other participants. (And thus could better avoid collisions). On the other hand, allowing the experiment to determine it would allow for more flexibility in the format of the subject ID.
(Another point against 2: UUID's are great, but also horrible to parse as humans. So it might not be as userfriendly as a shorter ID).
I'll leave it up to the implementer to decide.
In either case the code for sessions doesn't require that much updating, as it already uses subject ID's. The only thing to add is a way to access the ID from the datapoint. This, again, has multiple options to accomplish this:
DataPoint.session
and read the subject ID through that link
DataPoint
to check where the subject ID is stored (on the DP itself, or through an attached session) and retrieve it. This way, other code that wants to use this ID can just call that instead of having to figure out where to retrieve it fromDataPoint
, fill in the subject_id
field like the non-session method.Option 2 is by far the easiest, as one can just modify BaseUploadView._save_data_point
to accept an optional parameter. (And from a complexity perspective it would probably be the best). However, it would result in saving data in multiple places, so it's maybe not the best option in that regard.
(There's a hidden option 3: create a session record for non-sessioned uploads as well and store the subject ID there in both cases. I don't really like that option, as it would add/save more data than necessary in some cases. But I'm not against it if one decides to go for that as well).
To actually resolve the core issue, both download functions (experiments/utils/download.py
) need to be updated to use the new field to add the ID to the filename. This shouldn't be too hard, but keep in mind to check if there actually is a subject ID to add (as it's technically still optional). I would suggest just adding the subject-ID to the current filename format.
Well that took way too long and might be overly descriptive. Please tell me if anything is unclear
The descriptions above are pretty clear. However, I'm missing a bit of the point or am a little confused - possibly both. Should the research participants fill out their name, our use another id? As it is in the jsPsych boiler plates do something like:
let subject_id = jsPsych.randomization.randomID(8);
But when we substitute the current number with the number as generated by jsPsych above, than we replace one meaningless number by another. So coming back to the original question, what kind of meaning full identifier are we talking about?
The main point, as I gather, is that there's no relation between the contents of the file and the filename. Quote from the mail:
de csv-databestanden zijn genummerd gewoon met een volgnummer, dat blijkt in de praktijk ook behoorlijk onhandig (want dat nummer is in het bestand zelf nergens terug te vinden, dus als je een bepaalde pp weg wilt gooien ben je lang bezig om het juiste bestand te vinden).
But when we substitute the current number with the number as generated by jsPsych above, than we replace one meaningless number by another. So coming back to the original question, what kind of meaning full identifier are we talking about?
Yes, I see your point. We are definitly replacing one meaningless string with another; in my opinion however, it's still a improvement in two ways: 1) at least the meaningless string is the same everywhere and 2) it would make it possible for the experiment to decide how meaningless it should be (to the point that the experiment gets the freedom to just use a name or something. (Well, the last one would be a benefit if we implement it that way).
That does remind me; we probably should be encoding the ID on the jspsych-utils using urlencode if we're going to use path variables
Ok, one more question. In theory the javascript above that generates the subject_id, could generate the same number twice. It's unlikely, but... Do we want to something to prevent that to happen. If we lump the original internal number and the subject_id in the final name, we end up with different names, is that good enough?
If we would always use the session api we could enable a counter, that does ParticipantSession.count += 1
up for each created ParicipantSession and use that as an id. I think every Model has a default pk member right, which does something like this right?
I'll try to work out both methods anyway as it is a good exercise.
It could happen, yes... I think your proposal is a great idea.
If we would always use the session api we could enable a counter, that does ParticipantSession.count += 1 up for each created ParicipantSession and use that as an id. I think every Model has a default pk member right, which does something like this right?
The DataPoint
model already has a counter, so can find an example of that in the code already. It's implemented through a simple PositiveInteger
field on the model, and a signal to set it. (See experiments/signal.py
). A signal is a calback function which can be set to triggered upon certain actions; in this case it's called just before a DataPoint
is saved.
This advantage of using a signal to set the counter's value is not having to remember setting it everywhere one creates a DataPoint
.
Alternatively, one could just use the default pk
value. It's way easier, as it already exists. The only disadvantage I can think of is the fact pk
is a global instance counter, so one could get Session 1, 52, 88
instead of Session 1, 2, 3
in an experiment. If you're only appending the pk
for uniqueness sake, I think it would be fine. For actually displaying the number by itself, not so much.
Shall I close this issue?
Yes!
This feature hast been merged inhttps://github.com/UiL-OTS-labs/web-experiment-datastore/commit/0869a39b0e7c4163670d5c3667f23ba8fe30cfbb
Currently it only contains the internal number of the datapoint. This was done in order to avoid having to look at the data itself to generate a filename.
However, it has been reported that this isn't as useful to the researchers that analyse the data, as there is no relation between filename and data. Thus, we should try to add a more meaninfull identifier to the filename.
The main problem we need to solve is the fact that we cannot 100% rely on the supplied data to follow the same conventions. As such, we need to a) be fault tolerant and b) provide a fallback.