daniel-kukiela / nmt-chatbot

NMT Chatbot
GNU General Public License v3.0
387 stars 214 forks source link

PLZ HELP ASAP #150

Open harrysprogramming opened 4 years ago

harrysprogramming commented 4 years ago

Hi when i run prepare-data.py using my own files i get this Preparing training set from raw set File: train.from 100%|#######################################################################| 117859/117859 [00:03<00:00, 33688.63 lines/s] File: tst2012.from 100%|###############################################################################| 100/100 [00:00<00:00, 889.34 lines/s] File: tst2013.from 100%|##############################################################################| 100/100 [00:00<00:00, 2582.32 lines/s] File: train.to 100%|#######################################################################| 117859/117859 [00:03<00:00, 35517.59 lines/s] File: tst2012.to 100%|###############################################################################| 100/100 [00:00<00:00, 924.43 lines/s] File: tst2013.to 100%|##############################################################################| 100/100 [00:00<00:00, 2715.36 lines/s]

I know the file names are the same but that's because i get an error if they are different. the train to/from files are the originals but the tst 2012/2013 are my own when i run the prepare data script it says 'Preparing training set from raw set' the original files say 117859/117859 whereas my own files say 100/100 what am I doing wrong

Nathan-Chell commented 4 years ago

What exactly are you asking?

When you run prepare_data.py it accesses the files in new_data. Your files should be called train (I forget the extensions but they are the same as the train files already In new_data). Prepare_data.py will then create a new directory ./data in that directory train.to and train.from will be placed, these are the files that are used for training

Nathan-Chell commented 4 years ago

Inside ./data will be the 6 files that prepare_data.py created

kobrata6 commented 4 years ago

Hello! Can you please help me with my prepare_data.py file? Maybe we can contact via email or something like Discord, I do not know. I have put a question here, please, have a look at it :)

astrickash commented 3 years ago

This is because the test size in settings.py is 100. so feed any amount of data to prepare)data.py but due to that settings it will pick only 100 pairs.