OpenFn / wcs-wildmeat

Wildlife Conservation Society ConSocSci Project
1 stars 0 forks source link

Enhance postgres table creation job to handle all changed forms in a Kobo project #13

Closed taylordowns2000 closed 3 years ago

taylordowns2000 commented 3 years ago

Question: does the API allow us to return ONLY forms which have been changed since the cursor? Or, must we always fetch a list of all forms, then check to see when they've been updated?

get('/allFormsForKoboProject', state => {
  // expect state.data.result to be a list of all the forms for that project
  const formIds = state.data.result
    .filter(f => f.updatedAt > state.cursor // ???)
    .map(f => f.id)
  return { ...state, formIds };
});

each(
  '$.formIds[*]',
  state => {
    // some Mamadou stuff goes in here?
    return describe(..., state => {
      return upsert/insertTables(...)(state);
    });
);
lakhassane commented 3 years ago

@taylordowns2000 no PR yet but a question/argumentation.

I don't think we need cursor to fetch forms.

Also for a specific project (tag) we will always have one single form definition then we can trigger create-table inside the second operation of that fetch-form job.

You can check the commit for more context.

taylordowns2000 commented 3 years ago

That makes sense, but it's inefficient. with a couple of lines of code we could cut out lots of querying, processing, and redundancy.

By using the cursor, we're ensuring that we don't start the costly check-existing/compare/upsert-table sequence unless we need to. This scales much better.

Is there another reason in favor of not using a cursor?

lakhassane commented 3 years ago

@aleksa-krolls can we have more precision about what is consider as a project in Kobo.

  1. Is a project an instance of Kobo thus a set of forms?
  2. Is a project just one form of a kobo instance?
aleksa-krolls commented 3 years ago

@lakhassane Pretty sure a project is an instance of one form and has the unique formId. A Kobo account might have access to multiple projects (or forms). Users can upload new versions of that form to overwrite the form/project that was previously deployed: Example form (that is considered a project): https://kf.kobotoolbox.org/#/forms/afCPRfan8JdBt9sy9q48Jp/summary

If that's not a helpful answer, can you give me context for why you're asking and I'll try to help?

taylordowns2000 commented 3 years ago

This is helpful. I had misunderstood and thought we wanted to fetch all "forms" for a given "project". Can you confirm that the requirement is to fetch (and create tables for) all new or updated projects (each with it's formId, I'd call them forms!) within an account?

Right now we're envisioning the following flow:

  1. get a list of all available forms for a username/password combo.
  2. filter that list to show only those forms which have been created or modified since the last run.
  3. for each form, (a) upsert the required table(s) in Postgres and (b) upsert the required trigger and job in a given OpenFn.org project.

Is this the right thinking? This job could run on a timer, once an hour.

aleksa-krolls commented 3 years ago

That's correct! I think I was also confusing projects and accounts. But yes, we should fetch all projects (forms) that an account might have access to.