USGS-CMG / usgs-cmg-portal

do_convert.sh
6 stars 13 forks source link

gamone pycsw issues #309

Closed dnowacki-usgs closed 5 years ago

dnowacki-usgs commented 5 years ago

Issues in several of the steps:

pycsw_wipe

Gives RuntimeError: ERROR: too many SQL variables and never cleanly finishes.

Traceback (most recent call last):
  File "bin/pycsw-admin.py", line 286, in <module>
    admin.delete_records(CONTEXT, DATABASE, TABLE)
  File "/opt/pycsw/pycsw/core/admin.py", line 583, in delete_records
    repo.delete(constraint={'where': '', 'values': []})
  File "/opt/pycsw/pycsw/core/repository.py", line 383, in delete
    raise RuntimeError('ERROR: %s' % str(err.orig))
RuntimeError: ERROR: too many SQL variables

Proposed solution: Unclear; we need to either change SQLITE_MAX_VARIABLE_NUMBER runtime argument to sqlite or change the query so there are fewer variables.

pycsw_load

Many of the records give the SQL error ERROR: not inserted ERROR: UNIQUE constraint failed: records.identifier:

Processing file /store/iso_records/ts/silt_usgs_Projects_stellwagen_CF-1.6_NEARSHORE_8791aqd-a.nc.iso.xml (2096 of 2656)
Serialized metadata, parsing content model
Scanning for links
Scanning for gmd:transferOptions element(s)
Scanning for gmd:distributorTransferOptions element(s)
Scanning for srv:SV_ServiceIdentification links
adding srv link https://gamone.whoi.edu/thredds/dodsC/silt/usgs/Projects/stellwagen/CF-1.6/NEARSHORE/8791aqd-a.nc
adding srv link https://gamone.whoi.edu/thredds/wms/silt/usgs/Projects/stellwagen/CF-1.6/NEARSHORE/8791aqd-a.nc?service=WMS&version=1.3.0&request=GetCapabilities
adding srv link https://gamone.whoi.edu/thredds/ncss/grid/silt/usgs/Projects/stellwagen/CF-1.6/NEARSHORE/8791aqd-a.nc/dataset.html
adding srv link https://gamone.whoi.edu/thredds/fileServer/silt/usgs/Projects/stellwagen/CF-1.6/NEARSHORE/8791aqd-a.nc
Inserting gmd:MD_Metadata gov.usgs.cmgp:8791aqd-a into database sqlite:////database/cite.db, table records ....
ERROR: not inserted ERROR: UNIQUE constraint failed: records.identifier

So something (maybe essential, maybe not) is not getting added to cite.db

Edit I think this is because pycsw_wipe is failing above and it's trying to re-write records which already exist.

Proposed solution: This isn't actually a problem with pycsw_load; we just need to fix pycsw_wipe.

pycsw_force

OK

pycsw_optimize

OK

pycsw_export

After successfully writing several XML files, dies with the following error:

Traceback (most recent call last):
  File "bin/pycsw-admin.py", line 266, in <module>
    admin.export_records(CONTEXT, DATABASE, TABLE, XML_DIRPATH)
  File "/opt/pycsw/pycsw/core/admin.py", line 403, in export_records
    raise RuntimeError("Error writing to %s" % filename, err)
RuntimeError: ('Error writing to /export/estofs/atlantic/1.1.0.xml', FileNotFoundError(2, 'No such file or directory'))

I'm guessing this error is because it's trying to write a file in a path (/export/estofs/atlantic/) that does not exist (only /export exists, not the subdirs). Either we need to rewrite the filename, maybe using underscores instead of slashes, or we need to create the directory first and then write the file.

Proposed solution: Remove the records where identifier has / in it. Currently, these are the records where that is the case:

sqlite> select identifier from records where  identifier like '%/%';
estofs/atlantic/1.1.0
fmrc/us_east/US_East_Forecast_Model_Run_Collection_best.ncd
fvcom/archives/necofs_gom3_wave
fvcom/archives/necofs_gom3v13
fvcom/archives/necofs_mb
usgs/data2/rsignell/data/ssh.nc

Other

I'm still not convinced the cron job is actually running properly, as pycsw is not a binary on the path specified in the crontab. The crontab as written is giving pycsw_wipe etc. as arguments to pycsw when in actuality pycsw_wipe and friends are binaries on the path (located in /usr/local/bin).

rsignell-usgs commented 5 years ago

@tomkralidis, do you have a suggestion on how we should fix the ERROR: too many SQL variables error? Basically we are just trying to wipe all the records in the database.

tomkralidis commented 5 years ago

@rsignell-usgs which version of pycsw? can you open a ticket on pycsw's issue tracker? Else the quick workaround is to simply run delete from records directly in SQL.

rsignell-usgs commented 5 years ago

@dnowacki-usgs, can you carry the ball here?

dnowacki-usgs commented 5 years ago

Thanks @tomkralidis! Direct SQL query works; I opened an issue over on the pycsw tracker.

@rsignell-usgs the wipe hadn't been successful for a long time; manually clearing and re-loading got rid of most of the entries with / in them; now only usgs/data2/rsignell/data/ssh.nc remains. I manually deleted that record and was able to get pycsw_export to run successfully. I'm not sure where this record is being harvested, but if we can find and remove it, we should be good with all the steps completing successfully (once we get the wipe step working reliably).

dnowacki-usgs commented 5 years ago

Closing this as things seem to be working smoothly using Postgres as the DB.