Open harrysprogramming opened 4 years ago
What exactly are you asking?
When you run prepare_data.py it accesses the files in new_data. Your files should be called train (I forget the extensions but they are the same as the train files already In new_data). Prepare_data.py will then create a new directory ./data in that directory train.to and train.from will be placed, these are the files that are used for training
Inside ./data will be the 6 files that prepare_data.py created
Hello! Can you please help me with my prepare_data.py file? Maybe we can contact via email or something like Discord, I do not know. I have put a question here, please, have a look at it :)
This is because the test size in settings.py is 100. so feed any amount of data to prepare)data.py but due to that settings it will pick only 100 pairs.
Hi when i run prepare-data.py using my own files i get this Preparing training set from raw set File: train.from 100%|#######################################################################| 117859/117859 [00:03<00:00, 33688.63 lines/s] File: tst2012.from 100%|###############################################################################| 100/100 [00:00<00:00, 889.34 lines/s] File: tst2013.from 100%|##############################################################################| 100/100 [00:00<00:00, 2582.32 lines/s] File: train.to 100%|#######################################################################| 117859/117859 [00:03<00:00, 35517.59 lines/s] File: tst2012.to 100%|###############################################################################| 100/100 [00:00<00:00, 924.43 lines/s] File: tst2013.to 100%|##############################################################################| 100/100 [00:00<00:00, 2715.36 lines/s]
I know the file names are the same but that's because i get an error if they are different. the train to/from files are the originals but the tst 2012/2013 are my own when i run the prepare data script it says 'Preparing training set from raw set' the original files say 117859/117859 whereas my own files say 100/100 what am I doing wrong