GSS-Cogs / HMRC_RTS

HMRC_Regional Trade Statistics data
0 stars 0 forks source link

GSS_data/Trade/HMRC_RTS #30 failed #6

Open ajtucker opened 4 years ago

ajtucker commented 4 years ago

Build 'GSS_data/Trade/HMRC_RTS' is failing!

Last 50 lines of build output:

[...truncated 635 lines...]
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] echo
Finding job draft
[Pipeline] configFileProvider
provisioning config files...
copy managed file [JsonConfig] to file:/var/jenkins_home/workspace/GSS_data/Trade/HMRC_RTS@tmp/config4600260788839930033tmp
[Pipeline] {
[Pipeline] readFile
[Pipeline] readJSON
[Pipeline] withCredentials
Masking supported pattern matches of $USER or $PASS
[Pipeline] {
[Pipeline] withCredentials
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] }
[Pipeline] // script
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test draft dataset)
Stage "Test draft dataset" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Publish)
Stage "Publish" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] step

Changes since last successful build: No changes

View full output

ajtucker commented 4 years ago

Build 'GSS_data/Trade/HMRC_RTS' is failing!

Last 50 lines of build output:

[...truncated 639 lines...]
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] echo
Finding job draft
[Pipeline] configFileProvider
provisioning config files...
copy managed file [JsonConfig] to file:/var/jenkins_home/workspace/GSS_data/Trade/HMRC_RTS@tmp/config3846629138399847720tmp
[Pipeline] {
[Pipeline] readFile
[Pipeline] readJSON
[Pipeline] withCredentials
Masking supported pattern matches of $USER or $PASS
[Pipeline] {
[Pipeline] withCredentials
Masking supported pattern matches of $CACHE_USER or $CACHE_PASS
[Pipeline] {
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
[Pipeline] // withCredentials
[Pipeline] }
Deleting 1 temporary files
[Pipeline] // configFileProvider
[Pipeline] }
[Pipeline] // script
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test draft dataset)
Stage "Test draft dataset" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Publish)
Stage "Publish" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] step

Changes since last successful build: No changes

View full output

ajtucker commented 4 years ago

We may need to rethink this one as the Tidy CSV is now about 250MB and the csvlint step gets killed after half an hour.

Ultimately, while the conventional pipeline should work, for this kind of data that accrues over time, there are more optimal ways to update the eventual data cube.

mikeAdamss commented 4 years ago

I realise this'll come down to priorities and you're making a wider point here, but solely in terms of csvlint specifically: the checks we're doing (as I understand them) you could do in python pandas pretty easily and build it to handle any file sizes. Could be nice to control what's checked and how.

ajtucker commented 4 years ago

Certainly could and probably will need to in this case in the short term, as the csvlint Ruby app won't cope.

Longer term I'd like to keep CSV + CSVW JSON as the standard data interface, much like Unix pipes have text as their standard data interface.

We should be able to come up with a better CSV lint as a separate process to the data transformation process and keep the concerns separated by this data interface.

On Tue, 15 Oct 2019, 19:42 Michael Adams, notifications@github.com wrote:

I realise this'll come down to priorities and you're making a wider point here, but solely in terms of csvlint specifically: the checks we're doing (as I understand them) you could do in python pandas pretty easily and build it to handle any file sizes. Could be nice to control what's checked and how.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/GSS-Cogs/HMRC_RTS/issues/6?email_source=notifications&email_token=AAB7Y4ZTZVZSIDOYWSEATYLQOYFJVA5CNFSM4I72NNRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJZYOY#issuecomment-542350395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAB7Y45V2CUGTXZMQRE2CMTQOYFJVANCNFSM4I72NNRA .

ajtucker commented 4 years ago

Build 'GSS_data/Trade/HMRC_RTS' is failing!

Last 50 lines of build output:

[...truncated 129 lines...]
[Pipeline] getContext
[Pipeline] isUnix
[Pipeline] sh
+ docker inspect -f . gsscogs/csvlint
.
[Pipeline] withDockerContainer
Jenkins seems to be running inside container 1b40f53939a3e15e7cc68db705207edcc31184006b37a008267162fee93a4751
$ docker run -t -d -u 1000:1000 -w /var/jenkins_home/workspace/GSS_data/Trade/HMRC_RTS --volumes-from 1b40f53939a3e15e7cc68db705207edcc31184006b37a008267162fee93a4751 -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** -e ******** gsscogs/csvlint cat
$ docker top a621926f609dc71f33491d72b0b0b562e518110a2bc5be380be3378ca7664dc2 -eo pid,comm
[Pipeline] {
[Pipeline] script
[Pipeline] {
[Pipeline] ansiColor
[Pipeline] {
[Pipeline] fileExists
[Pipeline] findFiles
[Pipeline] sh
+ csvlint --no-verbose -s out/observations.csv-schema.json
Killed
[Pipeline] }
[Pipeline] // ansiColor
[Pipeline] }
[Pipeline] // script
[Pipeline] }
$ docker stop --time=1 a621926f609dc71f33491d72b0b0b562e518110a2bc5be380be3378ca7664dc2
$ docker rm -f a621926f609dc71f33491d72b0b0b562e518110a2bc5be380be3378ca7664dc2
[Pipeline] // withDockerContainer
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Upload Tidy Data)
Stage "Upload Tidy Data" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Test draft dataset)
Stage "Test draft dataset" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Publish)
Stage "Publish" skipped due to earlier failure(s)
[Pipeline] }
[Pipeline] // stage
[Pipeline] stage
[Pipeline] { (Declarative: Post Actions)
[Pipeline] script
[Pipeline] {
[Pipeline] step

Changes since last successful build: No changes

No changes

View full output