DigitalCommons / open-data

0 stars 0 forks source link

Use the LimeSurvey API to automatically get the latest responses from LimeSurvey #2

Closed joebillings closed 4 years ago

joebillings commented 5 years ago

Currently a major part of generating LOD is exporting the LimeSurvey responses and making sure they're in the right location for the generation scripts. We should use the LimeSurvey API to automate this step.

ColmMassey commented 4 years ago

Note: Export responses export as question code Complete only

wu-lee commented 4 years ago

How do we do this currently, manually?

wu-lee commented 4 years ago

Seems to be related (duplicated by?) #6

wu-lee commented 4 years ago

Ok, I've cobbled a basic proof of concept together (inlined below). It's a NodeJS script, possibly we need to translate it into Ruby for consistency with our sausage-mechanics. (There were no examples in Ruby.)

To use it we first have to enable the API in RPCJSON mode, by logging in to the LimeSurvey administration, going to "Global settings", choosing the tab "Interfaces" and selecting the JSON-RPC service. I've done this.

The result, when successful, is CSV data on standard output. I've not inspected it in detail, but it looks plausible to my untrained eye.

(See options for the export function here: https://api.limesurvey.org/classes/remotecontrol_handle.html#method_export_responses)

@ColmMassey - once we've got the data, what do we do with it?

/** Exports a Lime Survey as CSV.
 *
 * Usage:
 *
 *    export-survey.js <username> <password> <survey-id>
 *
 * Adapted from the slightly shonky NodeJS example here: 
 * https://manual.limesurvey.org/RemoteControl_2_API#NodeJS_example
 */ 

const http = require('https');
const process = require('process');

const baseOptions = {
    hostname: "solidarityeconomyassociation.limequery.com",
    path: "/index.php/admin/remotecontrol",
    keepAlive: true,
    agent: false,
    method: 'POST',
    headers: {
    'user-agent': 'Apache-HttpClient/c4.2.2 (java 1.5)',
    'host': 'solidarityeconomyassociation.limequery.com',
    'path': '/index.php/admin/remotecontrol',
    'content-type': 'application/json'
    },
};

async function rpc(method, ...params) {
    // Copy/extend the base options
    const options = Object.assign({}, baseOptions);

    return new Promise((resolve, reject) => {
    // console.log("getting", options);
    const req = http.request(options);
    let buf = '';
    req.on('response', res => {
        //console.log("got", res);
        if (res.statusCode == 200) {
        res.setEncoding('utf8');
        res.on('data', (chunk) => {
            // console.log("chunk", chunk);
            buf += chunk;
        });
        res.on('end', () => {
            const data = JSON.parse(buf);
            if (data.error)
            reject(data.error);
            resolve(data.result);
        });
        }
        else
        reject(res.statusCode);
    });

    req.on('error', e => {
        reject(e);
    });

    const body = JSON.stringify({
        method: method,
        params: params,
        id: 1,
    });
    // console.log("body", body);
    req.write(body);

    req.end();
    });
};

const user = process.argv[2];
const password = process.argv[3];
const survey_id = process.argv[4];
async function main() {
    try {
    const session_key = await rpc('get_session_key', user, password);

    //console.log("session key", session_key);

    const responseb64 = await rpc(
        'export_responses',
        session_key,
        survey_id,
        'csv', 'en'
    );
    const data = String(Buffer.from(responseb64, 'base64'));
    console.log(data);

    await rpc('release_session_key', session_key).resolve();
    }
    catch(e) {
    console.error("error",e);
    }
}

try {
    main();
}
catch(e) {
    console.error("error",e);
}
ColmMassey commented 4 years ago

once we've got the data, what do we do with it?

It gets saved into the appropriate original-data folder such as

https://github.com/SolidarityEconomyAssociation/open-data/tree/master/newcastle-pilot/2019-07-03/original-data

The Newcastle data has a date stamp in the file name e.g. 2019-09-17-original.csv.

The sausage factory code is then called to act upon this file.

wu-lee commented 4 years ago

A Ruby translation of this script now committed here:

https://github.com/SolidarityEconomyAssociation/open-data/tree/lime-survey-exporter

Not certain if this completely fulfills the requirements of #6, but it should work for manual use, esp. when integrated with the Makefiles (if we're keeping them as is? @dtmakm27 can you comment?)

May also need a log-in stored in the password store. This probably shouldn't be a personal account, but something with limited permissions (as limited as possible).

ColmMassey commented 4 years ago

Not certain if this completely fulfills the requirements of #6,

I think the Limesurvey tickets need a little work to tease apart the different elements, which I have started.

ColmMassey commented 4 years ago

I assume the script gets all the completed survey entries. Was there an option in the API to get all the survey options that have changes since a given timestamp? Not needed for now, but useful to know if it is available.

ColmMassey commented 4 years ago

Can we do the following queries?

Have there been any changes to a specified survey since the last build? Knowing this we would rebuild whenever there was any change.

Have there been any changes to any of the survey entries that have a specified boolean field set to True, since the last build. Knowing this we could rebuild whenever any change was made to Approved survey entries.

ColmMassey commented 4 years ago

Both Newcastle and Oxford surveys now have populated approved fields to use. We also have the new location field, which can hold a lat and long pair, semi-colon seperated. If that data is present it should take presidence over geolocation derived from postcode.

wu-lee commented 4 years ago

Ok, an update: there is a nominally working deployment of open-data on dev-0 which updates the data here:

https://dev.w3id.solidarityeconomy.coop/sea-lod/oxford-pilot/

There are some oddities there, the following of which I can confirm are in the original downloaded data:

And I've re-deployed the dev Oxford site to reference this dev data (still using dev.w3id.solidarityeconomy.coop URIs for now, rather than lod.coop URIs of any type):

https://dev.data.solidarityeconomy.coop/sea-lod/oxford-pilot

There was a bunch of permissions mole whacking and fiddling with sudo / cron / stmp relay settings required, but it now runs every 30 minutes, pulls the latest code from the lime-survey-downloads branch, then runs the seod command on the current Oxford data folder (which is refactored, with the latest conversion for lime-survey-data; seod is the command-line replacement for the old makefiles, added in the ica-youth-network-housekeeping branch). Specifically, tools/deploy/cronjob in the open-data repository is the script run.

To sidestep the password-store requirement of being installed in a sibling directory, with a GnuPG keyring available, and gpg-agent running with the right passphrase for decrypting, I'm using a private file containing the passwords in ~seopendata/.env on dev-0.

All this is deployed using an ansible playbook here, which I shall merge into the master branch there when I think I've no more tweaks to make to it:

https://github.com/SolidarityEconomyAssociation/technology-and-infrastructure/blob/seod-service/ansible/seod-service.yml

ColmMassey commented 4 years ago

There are some oddities there, the following of which I can confirm are in the original downloaded data:

* several blank entries with just an ID

* a "Zero Carbon Mapping Project" initiative miles from the others near Liverpool

Once we start using the approved flag they will be ignored. Both Oxford and Newbridge have the approved flag and they are set correctly.

wu-lee commented 4 years ago

Ok, I've updated the Oxford schema and added the logic to exclude these unapproved fields, and redeployed it to dev-0.

The latest cron run was at 18:30 and it seems to have removed those spurious initiatives - so it appears to be working as planned.

wu-lee commented 4 years ago

I'd like to create a dedicated account with limited permissions (if that's possible) for the automated lime-survey export.

@ColmMassey, is there an email address I can use for this? tech.accounts at solidarityeconomy.coop perhaps? (The password will get sent to this address.)

ColmMassey commented 4 years ago

@ColmMassey, is there an email address I can use for this? tech.accounts at solidarityeconomy.coop perhaps? (The password will get sent to this address.)

Does it need to be a full account or just an alias? tech.accounts is an alias.

ColmMassey commented 4 years ago

What else is left after the password email thing to finish this issue?

ColmMassey commented 4 years ago

Ok, I've updated the Oxford schema and added the logic to exclude these unapproved fields, and redeployed it to dev-0.

The latest cron run was at 18:30 and it seems to have removed those spurious initiatives - so it appears to be working as planned.

Nice.

wu-lee commented 4 years ago

@ColmMassey, is there an email address I can use for this? tech.accounts at solidarityeconomy.coop perhaps? (The password will get sent to this address.)

Does it need to be a full account or just an alias? tech.accounts is an alias.

An alias is perfect. Doesn't really warrant a full account. The main thing is that it can be shared, because this account exists mainly to limit what can be done should the password be compromised.

wu-lee commented 4 years ago

What else is left after the password email thing to finish this issue?

If NewBridge is in scope, then finish the conversion (fixing any of the problems in #33) and configure the deployment to also process that data. Send the error notifications somewhere "official" to be monitored by more than just me.

I think other jobs can be separate issues. For instance, we might want to know if it breaks, and how... currently there's an email notification output on every run which is just the text output from the sausage machine. Which isn't really tailored to give a concise report on what's wrong, in fact it's verbose enough that the recipient is likely to start getting sick of reading it and start missing actual problems. This could be improved by writing the more verbose stuff to a log, and then sending only a very terse summary if something goes wrong.

I'm sure there are other rough edges which might need smoothing. Deployments/upgrades and security could be improved I expect.There's a lot of duplicated code in oxford/newbridge, which maybe shouldn't be. The data output might be less fragile when presented with bad input if validation was better. We should probably try and clearly define what format fields are in and render that correctly so that it doesn't come out looking mangled in the generated data (i.e. misinterpretation of angle brackets, mojibake, collapsed paragraphs/bullet points, links, etc.) Documentation. Dealing with schema changes so that Lime Survey and the conversion keep in step. But these are I think out of the scope of this issue.

ColmMassey commented 4 years ago

If NewBridge is in scope, then finish the conversion (fixing any of the problems in #33) and configure the deployment to also process that data. Send the error notifications somewhere "official" to be monitored by more than just me.

If by that you mean 'Is automating Newbridge's limescale download and rebuild need to be done' then yes. Can you see the address of the admin who is emailed everytime a new survey is completed? That would be a good address to use. I do accept that the error messages may be not so useful.

ColmMassey commented 4 years ago

I think other jobs can be separate issues. For instance, we might want to know if it breaks, and how... currently there's an email notification output on every run which is just the text output from the sausage machine. Which isn't really tailored to give a concise report on what's wrong, in fact it's verbose enough that the recipient is likely to start getting sick of reading it and start missing actual problems. This could be improved by writing the more verbose stuff to a log, and then sending only a very terse summary if something goes wrong.

I'm sure there are other rough edges which might need smoothing. Deployments/upgrades and security could be improved I expect.There's a lot of duplicated code in oxford/newbridge, which maybe shouldn't be. The data output might be less fragile when presented with bad input if validation was better. We should probably try and clearly define what format fields are in and render that correctly so that it doesn't come out looking mangled in the generated data (i.e. misinterpretation of angle brackets, mojibake, collapsed paragraphs/bullet points, links, etc.) Documentation. Dealing with schema changes so that Lime Survey and the conversion keep in step. But these are I think out of the scope of this issue.

Can you copy these into a big unassigned ticket for next time we revisit Limesurvey?

wu-lee commented 4 years ago

Ok, update:

The deployment of open-data on dev-0 tracks a new branch in open-data, dev. Deployments to this branch will get pulled before the processing job is run.

For production data, I plan to re-deploy this application on sea-0, tracking the master branch of open-data, and deploying data with production URIs. Then changes to the lime-survey data will be processed both using the master and dev branches and get published on the relevant host (data1.solidarityeconomy.coop and dev.data.solidarityeconomy.coop respectively).

After that, this issue can I think be closed.

wu-lee commented 4 years ago

Can you see the address of the admin who is emailed everytime a new survey is completed? That would be a good address to use.

I can see the email address in the general settings for the Oxford and Newbridge surveys - they are the same, info at solidarityeconomy coop. But only by logging in and looking, at the moment. If you wanted each survey to go to a different user (as each survey might have a different admin address) then this would need some extra stuff. As would updating it should the settings in Lime Survey change.

Currently I'm just letting things work by default: any output from a cron job gets mailed to the user it's run as (seopendata in this case). So I set that account to forward email to the relevant address. (To have more than one destination would need some routing logic over and above this, which intercepts the usual cron email notification and forwards it elsewhere; or possibly a user account per survey.)

I've checked this works, insofar as dev-0 is concerned. The email gets sent. However, evidently Web Architects handle the email for solidarityeconomy.coop, and their MTA rejects the connection. So I'll need to get in touch with Chris and ask for dev-0 to be permitted to relay mail to it.

wu-lee commented 4 years ago

Can you copy these into a big unassigned ticket for next time we revisit Limesurvey?

See #36

wu-lee commented 4 years ago

Update:

This seems to work - I can see the data timestamps update on the website, and the content looks right to me. However, the mail server on sea-0 appears to be broken, in that the notifications don't get sent. /var/log/syslog includes errors like this:

Jul  7 10:26:01 sea-0 CRON[24861]: (seopendata) CMD (git -C /home/seopendata/working pull -f && . '/home/seopendata/.env' && '/home/seopendata/working/tools/deploy/cronjob' )
Jul  7 10:26:52 sea-0 cron[659]: /etc/mail/submit.cf: line 466: readcf: option RunAsUser: unknown user smmsp
Jul  7 10:26:52 sea-0 sendmail[24967]: NOQUEUE: SYSERR(seopendata): /etc/mail/submit.cf: line 466: readcf: option RunAsUser: unknJul  7 10:26:01 sea-0 CRON[24861]: (seopendata) CMD (git -C /home/seopendata/working pull -f && . '/home/seopendata/.env' && '/home/seopendata/working/tools/deploy/cronjob' )
Jul  7 10:26:52 sea-0 cron[659]: /etc/mail/submit.cf: line 466: readcf: option RunAsUser: unknown user smmsp
Jul  7 10:26:52 sea-0 sendmail[24967]: NOQUEUE: SYSERR(seopendata): /etc/mail/submit.cf: line 466: readcf: option RunAsUser: unknown user smmsp
Jul  7 10:26:52 sea-0 cron[659]: /etc/mail/submit.cf: line 486: readcf: option TrustedUser: unknown user smmsp
Jul  7 10:26:52 sea-0 sendmail[24967]: NOQUEUE: SYSERR(seopendata): /etc/mail/submit.cf: line 486: readcf: option TrustedUser: unknown user smmsp
Jul  7 10:26:52 sea-0 sendmail[24967]: NOQUEUE: SYSERR(seopendata): can not write to queue directory /var/spool/mqueue-client/ (RunAsGid=0, required=118): Permission denied
Jul  7 10:26:52 sea-0 CRON[24860]: (seopendata) MAIL (mailed 9700 bytes of output but got status 0x0046 from MTA#012)
own user smmsp
Jul  7 10:26:52 sea-0 cron[659]: /etc/mail/submit.cf: line 486: readcf: option TrustedUser: unknown user smmsp
Jul  7 10:26:52 sea-0 sendmail[24967]: NOQUEUE: SYSERR(seopendata): /etc/mail/submit.cf: line 486: readcf: option TrustedUser: unknown user smmsp
Jul  7 10:26:52 sea-0 sendmail[24967]: NOQUEUE: SYSERR(seopendata): can not write to queue directory /var/spool/mqueue-client/ (RunAsGid=0, required=118): Permission denied
Jul  7 10:26:52 sea-0 CRON[24860]: (seopendata) MAIL (mailed 9700 bytes of output but got status 0x0046 from MTA#012)

I will have to cautiously look into fixing this; it doesn't reflect well on the state of sea-0.

@ColmMassey - would you be able to check the data looks correct to you? Currently the production cron-job is run every 2 minutes (for testing), so you could also check that changes to the data propagate to production deployments, by making a minor change.

ColmMassey commented 4 years ago

I can see the email address in the general settings for the Oxford and Newbridge surveys - they are the same, info at solidarityeconomy coop. But only by logging in and looking, at the moment. If you wanted each survey to go to a different user (as each survey might have a different admin address) then this would need some extra stuff. As would updating it should the settings in Lime Survey change.

In Notifications & Data there are explicit notification email addresses. For newbridge, colm@solidarityeconomy.coop & admin@thenewbridgeproject.com are supposed to be recieving a full email each time there is a survey completed. It's that email I was refering to.

ColmMassey commented 4 years ago

@ColmMassey - would you be able to check the data looks correct to you? Currently the production cron-job is run every 2 minutes (for testing), so you could also check that changes to the data propagate to production deployments, by making a minor change.

Small change in data quickly rippled through to map. :-)

wu-lee commented 4 years ago

In Notifications & Data there are explicit notification email addresses. For newbridge, [colm@solidarityeconomy.coop]> (mailto:colm@solidarityeconomy.coop) & admin@thenewbridgeproject.com are > supposed to be recieving a full email each time there is a survey completed. It's that email I was refering to.

Is it appropriate for all of the above people, plus equivalents for Oxford, getting the emails as currently being sent to info@, once a day? Actually twice: once from sea-0, once from dev-0.

Like this: cron-notification.txt

I'd suggest these emails are really more for admins like Dean or I, who could recognise when they go wrong and take action. And possibly you too - but I think anyone else would feel spammed. Using a SEA email alias, however, would mean that controlling where they get sent is under your control, and not hard-wired into the deployment scripts.

wu-lee commented 4 years ago

sea-0 mail now fixed, rather crudely but effectively by installing sendmail, purging it, then installing postfix.

cron-job updated to make a forced checkout of the remote head of whatever branch is checked out - this means it won't get stuck when the remote repo makes a non-fast-forward-able change to that branch, as sometimes occurs.

So modulo choosing email addresses for notifications, this issue is now closable, @ColmMassey