DivineOmega / uxdm

🔀 UXDM helps developers migrate data from one system or format to another.
GNU Lesser General Public License v3.0
167 stars 9 forks source link

CSVSource performance with many rows #38

Open jameswilddev opened 3 years ago

jameswilddev commented 3 years ago

CSVSource appears to parse the file up until the current page, once per page. As there are only 10 rows per page, for a 100000 row CSV this means that the CSV file is actually read and parsed 10000 times per migration.

I'm able to improve migration performance a lot by bumping the perPage to 100 or 1000, but, there might be a better way of improving performance here; a single scan through the file at construction to generate an array of offsets into the file of each page, using that to skip to the appropriate section of the file when a page is requested?