EBISPOT / goci

GWAS Catalog Ontology and Curation Infrastructure
Apache License 2.0
26 stars 19 forks source link

Create strategy to delete harmonisation log files #1349

Closed ljwh2 closed 2 days ago

jiyue1214 commented 1 week ago

Planned strategy after discussing with Laura:

  1. Generate a plain text log info including the GCST_id, step and error message from the .commond.err file,
  2. Deleting all intermediate work folder . (We can always back to replicate the specific error if the error message is not enough.)

After the harmonisation pipeline, except successfully harmonised result which has been moved to ftp, we only need: plain text file log which has two columns, GCST and error message from last error file.

Example: GCST process Error message
GCST90012670 ten_percent_counts main_pysam.py: error: the following arguments are required: --effAl_col, --otherAl_col

To do list:

  1. In the pipeline: tag the step not only based on the GCST but also the "chr" (X,Y and MT are reasonable to be failed)
  2. Prepare the script to rephrase the log info and clean the entire work folder (last week)
  3. scrontab to run the script
jiyue1214 commented 1 week ago

I have cleaned log files older than 1 month. The new cleaning script is running daily to clean the day job launched 1 month ago.

jiyue1214 commented 3 days ago

For jobs finished before 19th June 2024,

For jobs run after the 19th of June 2024: It will be cleaned after 1 month, which is managed by a scrontab job running daily (a small and quick job)

For the log_clean_script:

  1. Added into the scrontab joband will run daily 19th July.
  2. updated to the gwasutil repo

How large the log file will be:

  1. 700K including 20230109 to yesterday.
  2. Daily log will be around 20K.

Error message example: 20240621 majordirectionmaptobuild 1 GCST90133175 raise KeyError(key) from err;KeyError: 'variant_id'; q 20240621 majordirectionmaptobuild 1 GCST90264796_chr8 value = float(value);ValueError: could not convert string to float: '+';