betagouv / analyse-flux-insertion

Outil d'analyse des flux et échanges de données dans le domaine de l'insertion
2 stars 1 forks source link

refactor(lecteur): Process batches of applications in large files #48

Closed aminedhobb closed 3 years ago

aminedhobb commented 3 years ago

In this PR I refactor the changes applied in #47 to process all the applications that are in the file chunks and not just one. This increases substantially the performance: For the 748 Mo, we went from ~20 minutes for processing to 75 seconds.

Also there might have been some imprecisions for very large files because I was not taking the index of the first match for the first offset in my previous implementation. This PR fixes that.