TheJacksonLaboratory / splicing-pipelines-nf

Repository for the Anczukow-Lab splicing pipeline
14 stars 9 forks source link

Gen3-DRS files not found in manifest not being saved into a log file #306

Open imendes93 opened 2 years ago

imendes93 commented 2 years ago

Problem

When using Gen3-DRS option, no information is provided if a file in the input.csv is not present in the manifest.json file.

Solution

Add a logging file with the file names in the input.csv that are not present in the manifest.json file

Implementation

This should be implemented in the filter_manifest.py helper script by comparing the file names the resulting filtered manifest with the original input.csv file

if len(reads_df[~reads_df['file_name'].isin(manifest_df['file_name'])])>0:
    print("The following file_name IDs where not found in manifest:")
    print(reads_df[~reads_df['file_name'].isin(manifest_df['file_name'])])
    reads_df[~reads_df['file_name'].isin(manifest_df['file_name'])].to_csv("not_found_GTEX_samples.txt", index=False)

This has been tried in tag Simplify-Gen3-DRS-7, corresponding to the failing run https://cloudos.lifebit.ai/public/jobs/6203e3cb91203701dcbcb686