GSS_data/Trade/HMRC_RTS #30 failed

Build 'GSS_data/Trade/HMRC_RTS' is failing!

Last 50 lines of build output:

[...truncated 635 lines...]
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] echo
Finding job draft
[Pipeline] configFileProvider
provisioning config files...
copy managed file [JsonConfig] to file:/var/jenkins_home/workspace/GSS_data/Trade/HMRC_RTS@tmp/config4600260788839930033tmp
[Pipeline] {
[Pipeline] readFile
[Pipeline] readJSON
[Pipeline] withCredentials
Masking supported pattern matches of $USER or $PASS
[Pipeline] {
[Pipeline] withCredentials
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] }
[Pipeline] // script
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test draft dataset)
Stage "Test draft dataset" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Publish)
Stage "Publish" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] step

Changes since last successful build: No changes

[vtula2000] 273dcd95b58f04df5c4f2706e1fe6f03b50f7f83 - Update 098
[Alex Tucker] efbaefb757b0c0cc2d614e26cd50e4b4c4d1fab3 - Rename to main.ipynb
[Alex Tucker] 273940d43c8f6832c51337f57da0c15622bb6bee - Remove old scripts.
[Alex Tucker] b3ac99cab09df6d17dce7008c583fb6f50b2991e - Remove old metadata template.
[Alex Tucker] bb6ed24923224bbde2472e03200ad8a4a5ba431c - Use gssutils to fetch metadata and file links from uktradeinfo.com
[Alex Tucker] fbda31247ef057dd1e6dc3e58cd5196182f4efcd - Use updated Jenkins pipeline. Note, there's no schema.json yet, so this
[Alex Tucker] 639e60af9f29c30d973ab8ae51002b95537176f2 - Use filenames without spaces (csvlint doesn't like spaces apparently).
[Alex Tucker] 716deaf524fb37e649a7e1ebb0bea9e47947bc62 - Add a CSVW schema to validate (some) output.
[Alex Tucker] ba4b44a7adfdaf7b1137dae84f705ea4f3108559 - Add trigger so build runs when ref_trade is successfully published.
[mikelovesbooks] 32a7c901b5b9f2c8e9512ab95e9f7c0fa68ffa98 - switched to creating schema.json's dynamically
[mikelovesbooks] dc344c54a1dac79f3c7995dc046140a28c110f51 - switched from .ipynb to .py
[mikelovesbooks] 920045cccf47205dcf30020802c6f9b254416f70 - updated Jenkinsfile to use template
[mikelovesbooks] 3efebd3131d2c9221881e9a5b1804b117bb0cb06 - lowercase imports and exports

View full output

Build 'GSS_data/Trade/HMRC_RTS' is failing!

Last 50 lines of build output:

[...truncated 639 lines...]
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] echo
Finding job draft
[Pipeline] configFileProvider
provisioning config files...
copy managed file [JsonConfig] to file:/var/jenkins_home/workspace/GSS_data/Trade/HMRC_RTS@tmp/config3846629138399847720tmp
[Pipeline] {
[Pipeline] readFile
[Pipeline] readJSON
[Pipeline] withCredentials
Masking supported pattern matches of $USER or $PASS
[Pipeline] {
[Pipeline] withCredentials
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] }
[Pipeline] // script
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test draft dataset)
Stage "Test draft dataset" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Publish)
Stage "Publish" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] step

Changes since last successful build: No changes

[vtula2000] 273dcd95b58f04df5c4f2706e1fe6f03b50f7f83 - Update 098
[Alex Tucker] efbaefb757b0c0cc2d614e26cd50e4b4c4d1fab3 - Rename to main.ipynb
[Alex Tucker] 273940d43c8f6832c51337f57da0c15622bb6bee - Remove old scripts.
[Alex Tucker] b3ac99cab09df6d17dce7008c583fb6f50b2991e - Remove old metadata template.
[Alex Tucker] bb6ed24923224bbde2472e03200ad8a4a5ba431c - Use gssutils to fetch metadata and file links from uktradeinfo.com
[Alex Tucker] fbda31247ef057dd1e6dc3e58cd5196182f4efcd - Use updated Jenkins pipeline. Note, there's no schema.json yet, so this
[Alex Tucker] 639e60af9f29c30d973ab8ae51002b95537176f2 - Use filenames without spaces (csvlint doesn't like spaces apparently).
[Alex Tucker] 716deaf524fb37e649a7e1ebb0bea9e47947bc62 - Add a CSVW schema to validate (some) output.
[Alex Tucker] ba4b44a7adfdaf7b1137dae84f705ea4f3108559 - Add trigger so build runs when ref_trade is successfully published.
[mikelovesbooks] 32a7c901b5b9f2c8e9512ab95e9f7c0fa68ffa98 - switched to creating schema.json's dynamically
[mikelovesbooks] dc344c54a1dac79f3c7995dc046140a28c110f51 - switched from .ipynb to .py
[mikelovesbooks] 920045cccf47205dcf30020802c6f9b254416f70 - updated Jenkinsfile to use template
[mikelovesbooks] 3efebd3131d2c9221881e9a5b1804b117bb0cb06 - lowercase imports and exports
[Alex Tucker] 5f5d4b46d7d338090ecfa2d02ede7ae8c0a498da - Ignore RequestAborted exceptions in empty_cache/sync_search calls.

View full output

We may need to rethink this one as the Tidy CSV is now about 250MB and the csvlint step gets killed after half an hour.

Ultimately, while the conventional pipeline should work, for this kind of data that accrues over time, there are more optimal ways to update the eventual data cube.

I realise this'll come down to priorities and you're making a wider point here, but solely in terms of csvlint specifically: the checks we're doing (as I understand them) you could do in python pandas pretty easily and build it to handle any file sizes. Could be nice to control what's checked and how.

Certainly could and probably will need to in this case in the short term, as the csvlint Ruby app won't cope.

Longer term I'd like to keep CSV + CSVW JSON as the standard data interface, much like Unix pipes have text as their standard data interface.

We should be able to come up with a better CSV lint as a separate process to the data transformation process and keep the concerns separated by this data interface.

On Tue, 15 Oct 2019, 19:42 Michael Adams, notifications@github.com wrote:

I realise this'll come down to priorities and you're making a wider point here, but solely in terms of csvlint specifically: the checks we're doing (as I understand them) you could do in python pandas pretty easily and build it to handle any file sizes. Could be nice to control what's checked and how.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/GSS-Cogs/HMRC_RTS/issues/6?email_source=notifications&email_token=AAB7Y4ZTZVZSIDOYWSEATYLQOYFJVA5CNFSM4I72NNRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJZYOY#issuecomment-542350395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7Y45V2CUGTXZMQRE2CMTQOYFJVANCNFSM4I72NNRA .

Build 'GSS_data/Trade/HMRC_RTS' is failing!

Last 50 lines of build output:

[...truncated 129 lines...]
[Pipeline] getContext
[Pipeline] isUnix
[Pipeline] sh
+ docker inspect -f . gsscogs/csvlint
.
[Pipeline] withDockerContainer
Jenkins seems to be running inside container 1b40f53939a3e15e7cc68db705207edcc31184006b37a008267162fee93a4751
$ docker run -t -d -u 1000:1000 -w /var/jenkins_home/workspace/GSS_data/Trade/HMRC_RTS --volumes-from 1b40f53939a3e15e7cc68db705207edcc31184006b37a008267162fee93a4751 -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** gsscogs/csvlint cat
$ docker top a621926f609dc71f33491d72b0b0b562e518110a2bc5be380be3378ca7664dc2 -eo pid,comm
[Pipeline] {
[Pipeline] script
[Pipeline] {
[Pipeline] ansiColor
[Pipeline] {
[Pipeline] fileExists
[Pipeline] findFiles
[Pipeline] sh
+ csvlint --no-verbose -s out/observations.csv-schema.json
Killed
[Pipeline] }
[Pipeline] // ansiColor
[Pipeline] }
[Pipeline] // script
[Pipeline] }
$ docker stop --time=1 a621926f609dc71f33491d72b0b0b562e518110a2bc5be380be3378ca7664dc2
$ docker rm -f a621926f609dc71f33491d72b0b0b562e518110a2bc5be380be3378ca7664dc2
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Upload Tidy Data)
Stage "Upload Tidy Data" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test draft dataset)
Stage "Test draft dataset" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Publish)
Stage "Publish" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] step

Changes since last successful build: No changes

[vtula2000] 273dcd95b58f04df5c4f2706e1fe6f03b50f7f83 - Update 098
[Alex Tucker] efbaefb757b0c0cc2d614e26cd50e4b4c4d1fab3 - Rename to main.ipynb
[Alex Tucker] 273940d43c8f6832c51337f57da0c15622bb6bee - Remove old scripts.
[Alex Tucker] b3ac99cab09df6d17dce7008c583fb6f50b2991e - Remove old metadata template.
[Alex Tucker] bb6ed24923224bbde2472e03200ad8a4a5ba431c - Use gssutils to fetch metadata and file links from uktradeinfo.com
[Alex Tucker] fbda31247ef057dd1e6dc3e58cd5196182f4efcd - Use updated Jenkins pipeline. Note, there's no schema.json yet, so this
[Alex Tucker] 639e60af9f29c30d973ab8ae51002b95537176f2 - Use filenames without spaces (csvlint doesn't like spaces apparently).
[Alex Tucker] 716deaf524fb37e649a7e1ebb0bea9e47947bc62 - Add a CSVW schema to validate (some) output.
[Alex Tucker] ba4b44a7adfdaf7b1137dae84f705ea4f3108559 - Add trigger so build runs when ref_trade is successfully published.
[mikelovesbooks] 32a7c901b5b9f2c8e9512ab95e9f7c0fa68ffa98 - switched to creating schema.json's dynamically
[mikelovesbooks] dc344c54a1dac79f3c7995dc046140a28c110f51 - switched from .ipynb to .py
[mikelovesbooks] 920045cccf47205dcf30020802c6f9b254416f70 - updated Jenkinsfile to use template
[mikelovesbooks] 3efebd3131d2c9221881e9a5b1804b117bb0cb06 - lowercase imports and exports
[Alex Tucker] 5f5d4b46d7d338090ecfa2d02ede7ae8c0a498da - Ignore RequestAborted exceptions in empty_cache/sync_search calls.

No changes

[Alex Tucker] ef905fd5f2db85565dc4eb0bde049534e4846b0f - Use --no-verbose to cut down line length of csvlint output, so it can be

View full output

GSS-Cogs / HMRC_RTS

GSS_data/Trade/HMRC_RTS #30 failed #6