PhilanthropyDataCommons / service

A project for collecting and serving public information associated with grant applications
GNU Affero General Public License v3.0
8 stars 2 forks source link

single row/record bulk upload from csv fails #847

Closed jmergy closed 6 months ago

jmergy commented 6 months ago

Downloaded template, completed 15 or so fields and the required email. Tried upload and immediately got error.

image

Will keep testing but could it be a permissions issue since no one outside the devs have done this yet? Felt like that.

jmergy commented 6 months ago

@reefdog I will follow-up if I figure anything out from my side. Will try different browsers, files, etc.

slifty commented 6 months ago

@jmergy can you send the file you're trying to upload to my OTS email?

reefdog commented 6 months ago

CORS strikes again! I was able to upload @jmergy's file to my local PDC, but when I attempt it in production:

[Error] Origin https://app.philanthropydatacommons.org is not allowed by Access-Control-Allow-Origin. Status code: 204
[Error] Fetch API cannot load https://pdc-service.nyc3.digitaloceanspaces.com/ due to access control checks.
[Error] Failed to load resource: Origin https://app.philanthropydatacommons.org is not allowed by Access-Control-Allow-Origin. Status code: 204 (pdc-service.nyc3.digitaloceanspaces.com, line 0)

@slifty How do we fix this, again?

slifty commented 6 months ago

Ah ha! I'll fix that ASAP -- it's a change in digital ocean unfortunately.

slifty commented 6 months ago

@reefdog I just added https://app.philanthropydatacommons.org as a valid CORS origin. Can you give it another whirl?

reefdog commented 6 months ago

Well, the upload worked, so your CORS fix is good, but the processing failed:

Failed CSV bulk upload

Here's the file I used. It's @jmergy's, but reduced to just a single proposal (the first one).

jmergy commented 6 months ago

@reefdog @slifty thanks. I can work it from it will take. I have another ~100 or so. I will try those in small batches.

reefdog commented 6 months ago

@jmergy You should probably hold off until we figure out why processing isn't working. Although… feel free to test different combos to see if you get a success.

jmergy commented 6 months ago

@reefdog I did - clean template, single record/row with just the columns with data (and required) and also with the full compliment of base fields and just the data I had against those in the row. All failed. Nothing fancy on data either in looking at it down to the one single row/record. So, something else must be up.

image

slifty commented 6 months ago

Cool -- will dig in! Thanks all.

jmergy commented 6 months ago

refreshing my API connection and method to try and get some data in that way if I can't with bulk

jmergy commented 6 months ago

Tried numerous different single records and finally just did two fields only to try and eliminate if a single base field as a problem in the mix- no dice. Bulk upload is not working at all @reefdog @slifty image image

slifty commented 6 months ago

I wonder if we don't have the proper env variables set right now (or if the access key we created doesn't have the proper permissions)

I'm working with @bickelj to get SSH access so I can view the logs, but in parallel...

Jesse do you know if these env vars are being set:

# S3 Credentials
# For more information on populating these please see
# https://docs.digitalocean.com/products/spaces/reference/s3-sdk-examples/
S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID}
S3_ACCESS_SECRET=${S3_ACCESS_SECRET}
S3_BUCKET=${S3_BUCKET}
S3_ENDPOINT=${S3_ENDPOINT}
S3_PATH_STYLE=${S3_PATH_STYLE} # `true` or `false`
S3_REGION=${S3_REGION}
bickelj commented 6 months ago

I see all six of those set in .env, and I see all six passed to the web container in the current compose script:

 grep S3 $( cat compose_current_file_name )
      - S3_ACCESS_KEY_ID=${S3_ACCESS_KEY_ID}
      - S3_ACCESS_SECRET=${S3_ACCESS_SECRET}
      - S3_ENDPOINT=${S3_ENDPOINT}
      - S3_PATH_STYLE=${S3_PATH_STYLE}
      - S3_REGION=${S3_REGION}
      - S3_BUCKET=${S3_BUCKET}

(As of this comment, https://raw.githubusercontent.com/PhilanthropyDataCommons/deploy/20240329-008e646/compose.yml)

And to be super-duper sure they made it into the container, I ran a shell on the web container and see they all have values when I run env, printing here redacted:

web@f41e98d62226:~/server$ env | grep S3 | sort | cut -d'=' -f1
S3_ACCESS_KEY_ID
S3_ACCESS_SECRET
S3_BUCKET
S3_ENDPOINT
S3_PATH_STYLE
S3_REGION
bickelj commented 6 months ago

@slifty See the chat for some log-pasta.

slifty commented 6 months ago

The error being reported is

{
  "level": 30,
  "time": 1711745979754,
  "pid": 1,
  "hostname": "f41e98d62226",
  "source": "/opt/philanthropy-data-commons/server/dist/jobQueue.js",
  "err": {
    "type": "Error",
    "message": "organization_name is not a valid BaseField short code.",
    "stack": "Error: organization_name is not a valid BaseField short code.\n    at /opt/philanthropy-data-commons/server/dist/tasks/processBulkUpload.js:101:19\n    at Array.forEach (<anonymous>)\n    at assertShortCodesReferToExistingBaseFields (/opt/philanthropy-data-commons/server/dist/tasks/processBulkUpload.js:98:16)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async assertShortCodesAreValid (/opt/philanthropy-data-commons/server/dist/tasks/processBulkUpload.js:106:5)\n    at async assertCsvContainsValidShortCodes (/opt/philanthropy-data-commons/server/dist/tasks/processBulkUpload.js:113:5)\n    at async assertBulkUploadCsvIsValid (/opt/philanthropy-data-commons/server/dist/tasks/processBulkUpload.js:129:5)\n    at async processBulkUpload (/opt/philanthropy-data-commons/server/dist/tasks/processBulkUpload.js:222:9)\n    at async doNext (/opt/philanthropy-data-commons/server/node_modules/graphile-worker/dist/worker.js:194:26)"
  },
  "scope": {
    "label": "job",
    "workerId": "worker-6e20b5ceac2d414c52",
    "taskIdentifier": "processBulkUpload",
    "jobId": "11"
  },
  "msg": "Bulk upload has failed"
}

That said, organization_name appears in the Base fields list and in the api. I'm looking in the logs now to see if there are other clues!

slifty commented 6 months ago

A few things:

1) proposal_title is not currently base fields in production, but it is in the template. The short term solution would be to add it via the API.

2) @jmergy right now I don't know why your simple / two column upload didn't work -- I was able to upload the following and it did process:

image

Can you try one last time?

The error log I posted above does indicate that organization_name was not a valid base field at whatever time you uploaded, however...

As a sanity check: did anybody add that base field to the system between then and now?

jmergy commented 6 months ago

Cool. I will test next time and report back.

jmergy commented 6 months ago

Was able to get data in earlier this week. I have some concerns on fail state stuff, but will generate a new issue if that comes up next time.

jmergy commented 6 months ago

Thanks!

reefdog commented 6 months ago

@jmergy Please do open a new issue documenting your issues with the fail state! I know some generic problems (like: we don't report useful errors back to the user!) but it would be good to capture things more specifically.