UDub-Impact / OData-Connector

Google Data Studio connector designed for the Open Data Kit OData API.
MIT License
11 stars 7 forks source link

Forms with repeats don't work #8

Closed lognaturel closed 3 years ago

lognaturel commented 3 years ago

I expected that with a form with repeats, the base Submissions table could still be accessed, just not the repeat tables. However, it looks like any attempt to connect to a form with a repeat gives a Data Set Configuration Error.

When I try to connect to the Submissions table, I see fields from the repeat so I think the issue is that all of the fields returned by the form schema endpoint are used no matter what table is requested. This is a weakness of using that endpoint to define the schema instead of using the OData metadata document. One possible way to overcome it would be to do something like if a table name other than Submissions is specified, only include nodes with that table name as a prefix (e.g. if the table name is repeat, only include nodes with a repeat prefix). I am not sure what happens with repeats nested in groups or other repeats so that would be something to check (e.g. is the OData table name Submissions.group.repeat? Something else?). If Submissions is specified, also see what other tables exist and filter out any nodes with those prefixes.

https://docs.google.com/spreadsheets/d/1rHMDO5ZwXDJTmN2XNfViw4-ZjPvFP17BuHU4_Zas00s/edit?usp=sharing is a simple form with questions inside and outside a repeat.

wenjunsun commented 3 years ago

Hello Helen, I uploaded a form with repeat into OData central and tested it on our connector, I didn't get a Data Set Configuration Error, and the returned data looks fine (as shown below), it is just that the repeat data fields are returned as null. I don't know if that is the same for you or do you not get any data at all? Can you provide a schema that gives you the error? I used the schema from here: https://sandbox.central.getodk.org/#/projects/124/forms/single-repeat. image

lognaturel commented 3 years ago

Huh, indeed! I wonder whether maybe I was using an old version of the connector. I can confirm I get the same behavior you've described for both the data explorer and a report. This is with the same form I was using previously.

It looks like the table name configuration is currently ignored, then? Perhaps we can track possible changes around that at https://github.com/UDub-Impact/OData-Connector/issues/7.

I think it is important to make sure that only fields for the current table are available. If there isn't a practical way to allow picking repeat tables, then it would be ok to default to Submissions, not prompt for table name at all, and make sure that any repeats are entirely omitted from the data set. We can discuss once we see how #7 pans out.

lognaturel commented 3 years ago

I had left a report with a data table with just q1 from the form from the original issue open and when I came back to it, it showed the "Data Set Configuration Error". Details show:

Debug
ODK central API connector by UW Impact++

SyntaxError: Unexpected end of JSON input

getData:758
Error ID: 6dfc19d8

If I try adding the source to a new report or to start a new explore session on it, I continue to get that error.

noorassan commented 3 years ago

Hmm that's a really strange error -- so you're saying that q1 was the only dimension being shown in the data explorer and it worked initially but then had the configuration error when you returned to it later? That makes me wonder if it has to do with our ODK token expiring or something along those lines, but I thought that we checked for that. I'm having trouble recreating the error at the moment, are you able to recreate it or did it just occur once?

If you could also let us know what line 758 is for you in the connector code, that would be very helpful 😄

noorassan commented 3 years ago

It turns out that for larger forms (upwards of ~50,000 rows), the JSON response we receive from the data source is truncated by Google Apps Script, which results in the JSON parsing error. This has been addressed by fetching data one piece at a time in groups of 50,000 rows. However, accessing larger forms seems to cause some performance issues with the connector and might still fail. See #10.