Big-Bee-Network / imageseq

tracks and packages images sequences from Big Bee specimen
Creative Commons Zero v1.0 Universal
0 stars 0 forks source link

not a valid identifier error #1

Closed seltmann closed 1 year ago

seltmann commented 1 year ago

@jhpoelen Preston is installed. For this catalog number, getting an error of not a valid identifier.

sh create-imageseq.sh "UCSB-IZC00046805" "https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip"

jhpoelen commented 1 year ago

@seltmann thanks for your message.

What version of jq do you have installed?

jhpoelen commented 1 year ago

On running:

bash create-imageseq.sh "UCSB-IZC00046805" "https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip" &> imageseq-UCSB-IZC00046805.log

I got attached logs as a result.

imageseq-UCSB-IZC00046805.log.txt

jhpoelen commented 1 year ago

also, on selecting the first media record for

preston ls | preston dwc-stream | grep "UCSB-IZC00046805" | grep media | head -n1 | jq .

Note the image urls do not have the expected 3d in the name, but 3x instead. Would you like me to add 3x as a 3d image name tag ?

{
  "http://www.w3.org/ns/prov#wasDerivedFrom": "line:zip:hash://sha256/1aed694cfd6d5bd28d419e4fe35a63715e6a677936c2288d26c774addb95dda0!/multimedia.csv!/L87",
  "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://rs.tdwg.org/ac/terms/Multimedia",
  "http://rs.tdwg.org/dwc/text/coreid": "2128110",
  "http://ns.adobe.com/xap/1.0/rights/UsageTerms": "CC0 1.0 (Public-domain)",
  "http://rs.tdwg.org/ac/terms/accessURI": "https://serv.biokic.asu.edu/imglib/ecdysis/UCSB_IZC/UCSB-IZC00046/UCSB-IZC00046805_3x_had_lg.jpg",
  "http://ns.adobe.com/xap/1.0/rights/WebStatement": null,
  "http://rs.tdwg.org/ac/terms/associatedSpecimenReference": "https://library.big-bee.net/portal/collections/individual/index.php?occid=2128110",
  "http://purl.org/dc/terms/type": "StillImage",
  "http://purl.org/dc/elements/1.1/creator": null,
  "http://rs.tdwg.org/ac/terms/comments": null,
  "http://purl.org/dc/terms/rights": "http://creativecommons.org/publicdomain/zero/1.0/",
  "http://rs.tdwg.org/ac/terms/thumbnailAccessURI": "https://serv.biokic.asu.edu/imglib/ecdysis/UCSB_IZC/UCSB-IZC00046/UCSB-IZC00046805_3x_had_tn.jpg",
  "http://purl.org/dc/terms/identifier": "https://serv.biokic.asu.edu/imglib/ecdysis/UCSB_IZC/UCSB-IZC00046/UCSB-IZC00046805_3x_had_lg.jpg",
  "http://rs.tdwg.org/ac/terms/subtype": "Photograph",
  "http://rs.tdwg.org/ac/terms/metadataLanguage": "en",
  "http://purl.org/dc/terms/format": "image/jpeg",
  "http://ns.adobe.com/xap/1.0/rights/Owner": "University of California Santa Barbara Invertebrate Zoology Collection (UCSB-IZC)",
  "http://rs.tdwg.org/ac/terms/caption": null,
  "http://ns.adobe.com/xap/1.0/MetadataDate": "2022-10-31 14:38:13",
  "http://rs.tdwg.org/ac/terms/providerManagedID": "urn:uuid:5813deb0-9767-4855-b6bf-c7f28cbadf12",
  "http://rs.tdwg.org/ac/terms/goodQualityAccessURI": "https://serv.biokic.asu.edu/imglib/ecdysis/UCSB_IZC/UCSB-IZC00046/UCSB-IZC00046805_3x_had.jpg"
}
jhpoelen commented 1 year ago

and

$ preston ls | preston dwc-stream | grep "UCSB-IZC00046805" | grep media | grep _3d_  | wc -l
0

meaning that no _3d_ text exists in the media records related to UCSB-IZC00046805.

jhpoelen commented 1 year ago

@seltmann hope this helps, if not, please holler and I can try again to figure this out.

seltmann commented 1 year ago

thanks @jhpoelen

version 1.6 = jq

I ran again using a catalogNumber I know has 3D and I am still getting the error.

sh create-imageseq.sh "UCSB-IZC00035429" "https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip"
+ CATALOG_NUMBER=UCSB-IZC00035429
+ DWC_URL=https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip
+ DIST_DIR=dist/UCSB-IZC00035429
+ mkdir -p dist/UCSB-IZC00035429
+ TMP_DIR=tmp/UCSB-IZC00035429
+ mkdir -p tmp/UCSB-IZC00035429
+ OPTS='--data-dir tmp/UCSB-IZC00035429/data'
create-imageseq.sh: line 30: `track-collection-extract-images': not a valid identifier

(base) katjaseltmann@registered-74 imageseq %

I am looking forward to getting this to work and then changing the script to find other kinds of images....

seltmann commented 1 year ago

@jhpoelen I got a bit further by setting my path, but no output. The dist/UCSB-IZC00035429 folder is empty

create-imageseq.sh "UCSB-IZC00035429" "https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip" 
+ CATALOG_NUMBER=UCSB-IZC00035429
/Users/katjaseltmann/Documents/imageseq/create-imageseq.sh: line 12: 2: //library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip: syntax error: operand expected (error token is "//library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip")
+ DIST_DIR=dist/UCSB-IZC00035429
+ mkdir -p dist/UCSB-IZC00035429
+ TMP_DIR=tmp/UCSB-IZC00035429
+ mkdir -p tmp/UCSB-IZC00035429
+ OPTS='--data-dir tmp/UCSB-IZC00035429/data'
+ track-collection-extract-images
+ preston track --data-dir tmp/UCSB-IZC00035429/data ''
+ preston dwc-stream --data-dir tmp/UCSB-IZC00035429/data
+ grep UCSB-IZC00035429
+ grep _3d_
+ jq --raw-output '.["http://rs.tdwg.org/ac/terms/accessURI"]'
+ xargs -L25 preston track --data-dir tmp/UCSB-IZC00035429/data
[main] WARN bio.guoda.preston.store.Archiver - failed to dereference [<>]
org.apache.http.client.ClientProtocolException
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:187)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
    at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
    at bio.guoda.preston.ResourcesHTTP.asInputStream(ResourcesHTTP.java:81)
    at bio.guoda.preston.ResourcesHTTP.asInputStream(ResourcesHTTP.java:67)
    at bio.guoda.preston.ResourcesHTTP.asInputStream(ResourcesHTTP.java:54)
    at bio.guoda.preston.ResourcesHTTP.asInputStream(ResourcesHTTP.java:58)
    at bio.guoda.preston.store.DereferencerContentAddressed.get(DereferencerContentAddressed.java:21)
    at bio.guoda.preston.store.DereferencerContentAddressed.get(DereferencerContentAddressed.java:8)
    at bio.guoda.preston.store.Archiver.handleBlankVersion(Archiver.java:49)
    at bio.guoda.preston.store.VersionProcessor.on(VersionProcessor.java:27)
    at bio.guoda.preston.store.StatementsListenerEmitterAdapter.on(StatementsListenerEmitterAdapter.java:14)
    at bio.guoda.preston.cmd.CmdTrack.processQueue(CmdTrack.java:46)
    at bio.guoda.preston.cmd.CmdActivity.run(CmdActivity.java:106)
    at bio.guoda.preston.cmd.CmdActivity.run(CmdActivity.java:74)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1939)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    at bio.guoda.preston.Preston.run(Preston.java:82)
    at bio.guoda.preston.Preston.main(Preston.java:72)
Caused by: org.apache.http.ProtocolException: Target host is not specified
    at org.apache.http.impl.conn.DefaultRoutePlanner.determineRoute(DefaultRoutePlanner.java:71)
    at org.apache.http.impl.client.InternalHttpClient.determineRoute(InternalHttpClient.java:125)
    at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
    ... 24 more
+ build-image-sequence-archive
+ preston alias --data-dir tmp/UCSB-IZC00035429/data --log tsv
+ grep UCSB-IZC00035429
+ grep jpg
+ cut -f1,3
+ sort
+ uniq
+ cut -f2
+ tee tmp/UCSB-IZC00035429/image-hashes.txt
+ nl -n rz
+ parallel --col-sep '\t' 'preston cat --data-dir tmp/UCSB-IZC00035429/data {2} > tmp/UCSB-IZC00035429/{1}-UCSB-IZC00035429.jpg'
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  Tange, O. (2023, January 22). GNU Parallel 20230122 ('Bolsonaristas').
  Zenodo. https://doi.org/10.5281/zenodo.7558957

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#citation-notice

To silence this citation notice: run 'parallel --citation' once.

Come on: You have run parallel 32 times. Isn't it about time 
you run 'parallel --citation' once to silence the citation notice?

+ local BEE_IMAGE_ZIP=dist/UCSB-IZC00035429/imageseq.zip
+ zip --no-dir-entries dist/UCSB-IZC00035429/imageseq.zip 'tmp/UCSB-IZC00035429/*.jpg'
    zip warning: name not matched: tmp/UCSB-IZC00035429/*.jpg

zip error: Nothing to do! (dist/UCSB-IZC00035429/imageseq.zip)
jhpoelen commented 1 year ago

@seltmann apologies for the friction here. Some of the errors appeared to be caused by the use of "sh" (bourne shell) vs bash . I use bash, and it appears that you use sh .

http://mywiki.wooledge.org/Bashism

I'll have a look at ways to make our results a little less platform (or tool) dependent.

jhpoelen commented 1 year ago

With recent changes, I was able to get

sh create-imageseq.sh "UCSB-IZC00035429" "https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip"

and

bash create-imageseq.sh "UCSB-IZC00035429" "https://library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip"

to generate image sequences.

Can you please update the script and reproduce?

jhpoelen commented 1 year ago

For some reason, I was unable to reproduce the error you saw

/Users/katjaseltmann/Documents/imageseq/create-imageseq.sh: line 12: 2: //library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip: syntax error: operand expected (error token is "//library.big-bee.net/portal/content/dwca/UCSB-IZC_DwC-A.zip")

Make me wonder whether I should add some test script to run on various platforms.

seltmann commented 1 year ago

@jhpoelen it worked like a charm and I created a wonderful spinning bee! and retrieved all of the data.

jhpoelen commented 1 year ago

@seltmann glad to hear you got the :bee: spinning!