averkij / lingtrain-aligner

Lingtrain Aligner — ML powered library for the accurate texts alignment.
GNU General Public License v3.0
123 stars 9 forks source link

Support for Larger Files and Consecutive Batch Execution #11

Open charliekocsis opened 2 months ago

charliekocsis commented 2 months ago

I really appreciate this program! Is there an option to increase the file size limit for uploads? I'm looking to process files around 500MB. Additionally, it would be helpful if the batch size could be increased. I’m comfortable with programming, so if you could guide me on what needs to be modified, I’d be happy to implement the changes myself. Another feature that would be beneficial is the ability to run through all batches consecutively without manual intervention. Currently, I have to click 'next batch' about 18 times and wait for each one to finish. It would be great to have an automated process for this.

averkij commented 1 month ago

Hello! Sorry for late answer.

You can simply align texts just with Python and implement needed logic. Here is the example. Just change the model to appropriate one.

https://colab.research.google.com/drive/1q4hqSaht8xsl_CXlZdKBi4cDFPYNZA-p

UI is needed for the easy manual checking and for the conflicts which can't be resolved automatically. Considering sizes of your files I suppose that you want to massivly extract parallel candidates.

So try the Python example for it.