NASA-PDS / nucleus

Nucleus is a software platform used to create workflows for the Planetary Data (PDS).
https://nasa-pds.github.io/nucleus
Apache License 2.0
0 stars 0 forks source link

Support processing of PDS products with fz file format in Nucleus #74

Closed ramesh-maddegoda closed 6 months ago

ramesh-maddegoda commented 8 months ago

Some of the PDS products processed by Nucleus has compressed version of the data (fitz), and not the uncompressed (fits) in order to save space for archiving the data. Also, in the product label XML, it is referring to the .fits format while the actual file shared with the product label are in .fitz format. This created challenges to determine the reception of a complete product in staging S3 bucket, when the files are uploaded to the staging S3 bucket to be processed by Nucleus.

It is required to,

References:

ramesh-maddegoda commented 8 months ago

It was required build a Amazon Linux compatible version of fpack/funpack from the source code

Download the latest 4.3.0 version of CFITSIO See what's new Latest V4.3.0 source code package - Compile It Yourself Unix .tar file cfitsio-4.3.0.tar.gz - see the README file for instructions. The .tar file places unpacks files into a directory named 'cfitsio-4.3.0'. (The latest version is always available from this link: cfitsio_latest.tar.gz).

The zlib was missing in Amazon Linux EC2 instance that was used to build this. There for it was required to build zlib also.

configure
make -j4
make install
jordanpadams commented 8 months ago

Status: Ramesh working on architecture and implementation in Nucleus

tloubrieu-jpl commented 7 months ago

validate used by Ramesh does not validate the CSS products. Ramesh will upgrade.

ramesh-maddegoda commented 7 months ago

The pull request is available at https://github.com/NASA-PDS/nucleus/pull/78

ramesh-maddegoda commented 7 months ago

The lambda based implementation made for this was working, but it was expensive due to the fact it consumes a large amount time in lambda to download and extract the .fz file. It was decided to come up with a more cost effective option download and extract .fz files.

tloubrieu-jpl commented 7 months ago

@ramesh-maddegoda works on the new more cost-effective design with datasync.

tloubrieu-jpl commented 7 months ago

90% of the development is done.

jordanpadams commented 6 months ago

Status: @ramesh-maddegoda working through some issues with duplicate data in database. Lambda is working as expected. Able to trigger Nucleus workflow. Harvest is failing because our OpenSearch connection is failing for some reason. TBD reason. PR to be submitted before vacation.

jordanpadams commented 6 months ago

Per #80 going to close this out with initial implementation completed. If additional work is required, we will open more tickets.