Kurt-Hetrick / JHG_Clincal_Exome_Pipeline

0 stars 0 forks source link

generate md5 for for git-lfs. compare to static file. #42

Closed Kurt-Hetrick closed 1 year ago

Kurt-Hetrick commented 1 year ago

tentative. depends on wall clock time. if wall clock is not prohibitory on submission, then exit. otherwise. its a warning (compute node script with teams notification at end).

Kurt-Hetrick commented 1 year ago

this might be unnecessary at this time for this pipeline as I believe that there will be no untracked files for this pipeline currently. still it might not be a bad thing to do anyways in case it does happen in the future. it does need to be done for the cftr whole gene pipeline and that code for the most part can be transported into here.

Kurt-Hetrick commented 1 year ago

doing this for the cftr pipeline is not viable...it is viable for this pipeline (wall clock time is roughly 3 minutes in a single threaded application), however; currently there are no files that will not be tracked via git lfs so there will already be a separate validation step to be implemented in the future. so...punting to avoid having to spend time on unnecessary work that I don't have time to spend on since I need to work on other projects and this process and implementation has snowballed into an extraordinary amount of unplanned work for me. if there comes a point where files are too big to be tracked for git lfs, then i will revisit.

Kurt-Hetrick commented 1 year ago

i forgot that the control gvcf file is too big for git lfs, so a md5 validation needs to be done. however; i am going to elect to do it as a separate step anyways with a warning as opposed to within the submission with a hard exit. the reason is that doing it that way is a future proofing step in case it gets to the point where their are too many/big files to validate within the submission within a reasonable time frame as opposed to implement it one way and then have to rip it all out and redo another way down the road. the other benefit is that this code can transferred from the cftr pipeline since that's how it is done there. there is one caveat that should be mentioned although the probability of it occurring is extremely low. for some reason, i can't get the md5 validation teams notification to send without making the mail command verbose. doing this writes the email to a file on the local server (which I delete after the notification is sent). so there is the potential of some sort of collision happening...although the potential of that happening is extremely low and I don't know if there is already some sort safeguards in place (i believe does some sort of file locking on that file, but am not sure).

Kurt-Hetrick commented 1 year ago

done