clingen-data-model / clinvar-streams

1 stars 0 forks source link

Revisit lazy processing of files in GCP bucket #63

Closed theferrit32 closed 1 year ago

theferrit32 commented 1 year ago

Currently is not done lazily, the function which iterates over lines calls a callback function for each output message which allows the caller to define behavior to happen for each. This was a quick solution but may be better implemented with a lazy-cat implementation that keeps appends batches of lines while keeping the file handle open and closes the file handle upon reaching the end.

https://github.com/clingen-data-model/clinvar-streams/pull/60#discussion_r947244170

theferrit32 commented 1 year ago

This is being further addressed by removing the callback strategy used in #60 and replacing it with a lazy-seq strategy.

process-clinvar-drop-refactor will return a lazy seq of output messages generated by a stack of for loops which will internally handle opening/closing file handles as needed.

https://github.com/clingen-data-model/clinvar-streams/blob/c911c14e28cfbbe12d4f048598cbe3316b3c8bb2/src/clinvar_raw/stream.clj#L142-L194