Preprocess：tgt_vocab_size is always four

chens72 commented 3 years ago

When preprocessing, tgt_vocab_size is always 4， and src_vocab_size is always 2. I don't know what caused this, because it was not like this before.

The preprocessing commands are as follows: python preprocess.py \ -train_src data/QGprocess/train-src-ans-features-pos.txt \ -train_tgt data/QGprocess/train-target.txt \ -valid_src data/QGprocess/dev-src-ans-features-pos.txt \ -valid_tgt data/QGprocess/dev-target.txt \ -save_data data/QGprocess-features-pos \ -dynamic_dict \ -lower \ -overwrite \ -src_seq_length 5000 \ -tgt_seq_length 200

The output is as follows: [2020-12-10 01:52:57,466 INFO] Extracting features... [2020-12-10 01:52:58,125 INFO] number of source features: 2. [2020-12-10 01:52:58,125 INFO] number of target features: 0. [2020-12-10 01:52:58,125 INFO] Building Fields object... [2020-12-10 01:52:58,126 INFO] Building & saving training data... [2020-12-10 01:52:58,128 WARNING] Shards for corpus train already exist, will be overwritten because -overwrite option is set. [2020-12-10 01:52:58,135 WARNING] Overwrite shards for corpus None [2020-12-10 01:53:00,268 INFO] Building shard 0. [2020-12-10 01:53:06,547 INFO] saving 0th train data shard to data/QGprocess-features-pos.train.0.pt. [2020-12-10 01:53:11,137 INFO] tgt vocab size: 4. [2020-12-10 01:53:11,137 INFO] src vocab size: 2. [2020-12-10 01:53:11,138 INFO] src_feat_0 vocab size: 2. [2020-12-10 01:53:11,138 INFO] src_feat_1 vocab size: 2. [2020-12-10 01:53:11,332 INFO] Building & saving validation data... [2020-12-10 01:53:11,333 WARNING] Shards for corpus valid already exist, will be overwritten because -overwrite option is set. [2020-12-10 01:53:11,339 WARNING] Overwrite shards for corpus None [2020-12-10 01:53:12,281 INFO] Building shard 0. [2020-12-10 01:53:13,040 INFO] saving 0th valid data shard to data/QGprocess-features-pos.valid.0.pt.

It seems that it did not calculate the token in the corpus.

francoishernandez commented 3 years ago

This is strange.

Which version are you using?
Can you post head -10 of your various data files?

chens72 commented 3 years ago

The version I use is OpenNMT-py 1.2.0
The data are as follows: train-src-ans-features-pos.txt： the￨O￨O british￨NATIONALITY￨O isles￨O￨O are￨O￨O a￨O￨O group￨O￨O of￨O￨O islands￨O￨O off￨O￨O the￨O￨O north￨O￨O -￨O￨O western￨O￨O coast￨O￨O of￨O￨O continental￨O￨O europe￨O￨O that￨O￨O consist￨O￨O of￨O￨O the￨O￨O islands￨O￨O of￨O￨O great￨COUNTRY￨B britain￨COUNTRY￨I ,￨O￨I ireland￨COUNTRY￨I and￨O￨O over￨O￨O six￨NUMBER￨O thousand￨NUMBER￨O smaller￨O￨O isles￨O￨O .￨O￨O situated￨O￨O in￨O￨O the￨O￨O north￨O￨O atlantic￨O￨O ,￨O￨O the￨O￨O islands￨O￨O have￨O￨O a￨O￨O total￨O￨O area￨O￨O of￨O￨O approximately￨O￨O 315￨NUMBER￨O ,￨O￨O 159￨NUMBER￨O km2￨O￨O ,￨O￨O and￨O￨O a￨O￨O combined￨O￨O population￨O￨O of￨O￨O just￨O￨O under￨O￨O 70￨NUMBER￨O million￨NUMBER￨O .￨O￨O two￨NUMBER￨O sovereign￨O￨O states￨O￨O are￨O￨O located￨O￨O on￨O￨O the￨O￨O islands￨O￨O :￨O￨O ireland￨COUNTRY￨O (￨O￨O which￨O￨O covers￨O￨O roughly￨O￨O five￨NUMBER￨O -￨O￨O sixths￨O￨O of￨O￨O the￨O￨O island￨O￨O with￨O￨O the￨O￨O same￨O￨O name￨O￨O )￨O￨O and￨O￨O the￨O￨O united￨COUNTRY￨O kingdom￨COUNTRY￨O of￨COUNTRY￨O great￨COUNTRY￨O britain￨COUNTRY￨O and￨COUNTRY￨O northern￨COUNTRY￨O ireland￨COUNTRY￨O .￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O also￨O￨O include￨O￨O three￨NUMBER￨O crown￨O￨O dependencies￨O￨O :￨O￨O the￨O￨O isle￨O￨O of￨O￨O man￨O￨O and￨O￨O ,￨O￨O by￨O￨O tradition￨O￨O ,￨O￨O the￨O￨O bailiwick￨O￨O of￨O￨O jersey￨O￨O and￨O￨O the￨O￨O bailiwick￨O￨O of￨O￨O guernsey￨O￨O in￨O￨O the￨O￨O channel￨O￨O islands￨O￨O ,￨O￨O although￨O￨O the￨O￨O latter￨O￨O are￨O￨O not￨O￨O physically￨O￨O a￨O￨O part￨O￨O of￨O￨O the￨O￨O archipelago￨O￨O .￨O￨O what￨O￨O is￨O￨O two￨NUMBER￨O islands￨O￨O that￨O￨O are￨O￨O part￨O￨O of￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O ?￨O￨O the￨O￨O oldest￨O￨O rocks￨O￨O in￨O￨O the￨O￨O group￨O￨O are￨O￨O in￨O￨O the￨O￨O north￨O￨O west￨O￨O of￨O￨O scotland￨COUNTRY￨O ,￨O￨O ireland￨COUNTRY￨O and￨O￨O north￨O￨O wales￨O￨O and￨O￨O are￨O￨O 2￨NUMBER￨O ,￨O￨O 700￨DURATION￨O million￨DURATION￨O years￨DURATION￨O old￨DURATION￨O .￨O￨O during￨O￨O the￨O￨O silurian￨MISC￨O period￨O￨O the￨O￨O north￨O￨O -￨O￨O western￨O￨O regions￨O￨O collided￨O￨O with￨O￨O the￨O￨O south￨LOCATION￨O -￨LOCATION￨O east￨LOCATION￨O ,￨O￨O which￨O￨O had￨O￨O been￨O￨O part￨O￨O of￨O￨O a￨O￨O separate￨O￨O continental￨O￨O landmass￨O￨O .￨O￨O the￨O￨O topography￨O￨O of￨O￨O the￨O￨O islands￨O￨O is￨O￨O modest￨O￨O in￨O￨O scale￨O￨O by￨O￨O global￨O￨O standards￨O￨O .￨O￨O ben￨O￨O nevis￨O￨O rises￨O￨O to￨O￨O an￨O￨O elevation￨O￨O of￨O￨O only￨O￨O 1￨NUMBER￨O ,￨O￨O 344￨NUMBER￨O metres￨O￨O (￨O￨O 4￨NUMBER￨O ,￨O￨O 409￨NUMBER￨O ft￨O￨O )￨O￨O ,￨O￨O and￨O￨O lough￨O￨O neagh￨O￨O ,￨O￨O which￨O￨O is￨O￨O notably￨O￨O larger￨O￨O than￨O￨O other￨O￨O lakes￨O￨O on￨O￨O the￨O￨O isles￨O￨O ,￨O￨O covers￨O￨O 390￨NUMBER￨O square￨O￨O kilometres￨O￨O (￨O￨O 151￨NUMBER￨O sq￨O￨O mi￨O￨O )￨O￨O .￨O￨O the￨O￨O climate￨O￨O is￨O￨O temperate￨O￨O marine￨TITLE￨O ,￨O￨O with￨O￨O mild￨O￨O winters￨SET￨O and￨O￨O warm￨O￨O summers￨SET￨O .￨O￨O the￨O￨O north￨O￨O atlantic￨O￨O drift￨O￨O brings￨O￨O significant￨O￨O moisture￨O￨O and￨O￨O raises￨O￨O temperatures￨O￨O 11￨NUMBER￨O °￨O￨O c￨O￨O (￨O￨O 20￨NUMBER￨O °￨O￨O f￨O￨O )￨O￨O above￨O￨O the￨O￨O global￨O￨O average￨O￨O for￨O￨O the￨O￨O latitude￨O￨O .￨O￨O this￨O￨O led￨O￨O to￨O￨O a￨O￨O landscape￨O￨O which￨O￨O was￨O￨O long￨O￨O dominated￨O￨O by￨O￨O temperate￨O￨O rainforest￨O￨O ,￨O￨O although￨O￨O human￨O￨O activity￨O￨O has￨O￨O since￨O￨O cleared￨O￨O the￨O￨O vast￨O￨O majority￨O￨O of￨O￨O forest￨O￨O cover￨O￨O .￨O￨O the￨O￨O region￨O￨O was￨O￨O re￨O￨O -￨O￨O inhabited￨O￨O after￨O￨O the￨O￨O last￨O￨O glacial￨O￨O period￨O￨O of￨O￨O quaternary￨O￨O glaciation￨O￨O ,￨O￨O by￨O￨O 12￨NUMBER￨O ,￨O￨O 000￨DATE￨O bc￨DATE￨O when￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O was￨O￨O still￨O￨O a￨O￨O peninsula￨O￨O of￨O￨O the￨O￨O european￨NATIONALITY￨O continent￨O￨O .￨O￨O ireland￨COUNTRY￨O ,￨O￨O which￨O￨O became￨O￨O an￨O￨O island￨O￨O by￨O￨O 12￨NUMBER￨O ,￨O￨O 000￨DATE￨O bc￨DATE￨O ,￨O￨O was￨O￨O not￨O￨O inhabited￨O￨O until￨O￨O after￨O￨B 8000￨DATE￨I bc￨DATE￨I .￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O became￨O￨O an￨O￨O island￨O￨O by￨O￨O 5600￨DATE￨O bc￨DATE￨O .￨O￨O when￨O￨O is￨O￨O it￨O￨O believed￨O￨O that￨O￨O ireland￨COUNTRY￨O became￨O￨O inhabited￨O￨O ?￨O￨O hiberni￨O￨O (￨O￨O ireland￨COUNTRY￨O )￨O￨O ,￨O￨O pictish￨O￨O (￨O￨O northern￨O￨O britain￨COUNTRY￨O )￨O￨O and￨O￨O britons￨O￨O (￨O￨O southern￨O￨O britain￨COUNTRY￨O )￨O￨O tribes￨O￨O ,￨O￨O all￨O￨O speaking￨O￨O insular￨O￨O celtic￨O￨O ,￨O￨O inhabited￨O￨O the￨O￨O islands￨O￨O at￨O￨O the￨DATE￨O beginning￨DATE￨B of￨DATE￨I the￨DATE￨I 1st￨DATE￨I millennium￨DATE￨I ad￨O￨I .￨O￨O much￨O￨O of￨O￨O brittonic￨O￨O -￨O￨O controlled￨O￨O britain￨COUNTRY￨O was￨O￨O conquered￨O￨O by￨O￨O the￨O￨O roman￨MISC￨O empire￨O￨O from￨O￨O ad￨O￨O 43￨NUMBER￨O .￨O￨O the￨O￨O first￨ORDINAL￨O anglo￨O￨O -￨O￨O saxons￨O￨O arrived￨O￨O as￨O￨O roman￨O￨O power￨O￨O waned￨O￨O in￨O￨O the￨DATE￨O 5th￨DATE￨O century￨DATE￨O and￨O￨O eventually￨O￨O dominated￨O￨O the￨O￨O bulk￨O￨O of￨O￨O what￨O￨O is￨O￨O now￨DATE￨O england￨COUNTRY￨O .￨O￨O viking￨O￨O invasions￨O￨O began￨O￨O in￨O￨O the￨DATE￨O 9th￨DATE￨O century￨DATE￨O ,￨O￨O followed￨O￨O by￨O￨O more￨O￨O permanent￨O￨O settlements￨O￨O and￨O￨O political￨O￨O change￨O￨O —￨O￨O particularly￨O￨O in￨O￨O england￨COUNTRY￨O .￨O￨O the￨O￨O subsequent￨O￨O norman￨O￨O conquest￨O￨O of￨O￨O england￨COUNTRY￨O in￨O￨O 1066￨DATE￨O and￨O￨O the￨O￨O later￨O￨O angevin￨O￨O partial￨O￨O conquest￨O￨O of￨O￨O ireland￨COUNTRY￨O from￨O￨O 1169￨DATE￨O led￨O￨O to￨O￨O the￨O￨O imposition￨O￨O of￨O￨O a￨O￨O new￨O￨O norman￨O￨O ruling￨O￨O elite￨O￨O across￨O￨O much￨O￨O of￨O￨O britain￨COUNTRY￨O and￨O￨O parts￨O￨O of￨O￨O ireland￨COUNTRY￨O .￨O￨O by￨O￨O the￨O￨O late￨O￨O middle￨O￨O ages￨O￨O ,￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O was￨O￨O separated￨O￨O into￨O￨O the￨O￨O kingdoms￨O￨O of￨O￨O england￨COUNTRY￨O and￨O￨O scotland￨COUNTRY￨O ,￨O￨O while￨O￨O control￨O￨O in￨O￨O ireland￨COUNTRY￨O fluxed￨O￨O between￨O￨O gaelic￨O￨O kingdoms￨O￨O ,￨O￨O hiberno￨O￨O -￨O￨O norman￨O￨O lords￨O￨O and￨O￨O the￨O￨O english￨NATIONALITY￨O -￨O￨O dominated￨O￨O lordship￨O￨O of￨O￨O ireland￨COUNTRY￨O ,￨O￨O soon￨O￨O restricted￨O￨O only￨O￨O to￨O￨O the￨O￨O pale￨O￨O .￨O￨O the￨O￨O 1603￨DATE￨O union￨O￨O of￨O￨O the￨O￨O crowns￨O￨O ,￨O￨O acts￨O￨O of￨O￨O union￨O￨O 1707￨DATE￨O and￨O￨O acts￨O￨O of￨O￨O union￨O￨O 1800￨DATE￨O attempted￨O￨O to￨O￨O consolidate￨O￨O britain￨COUNTRY￨O and￨O￨O ireland￨COUNTRY￨O into￨O￨O a￨O￨O single￨O￨O political￨O￨O unit￨O￨O ,￨O￨O the￨O￨O united￨ORGANIZATION￨O kingdom￨ORGANIZATION￨O of￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O and￨O￨O ireland￨COUNTRY￨O ,￨O￨O with￨O￨O the￨O￨O isle￨O￨O of￨O￨O man￨O￨O and￨O￨O the￨O￨O channel￨O￨O islands￨O￨O remaining￨O￨O as￨O￨O crown￨O￨O dependencies￨O￨O .￨O￨O the￨O￨O expansion￨O￨O of￨O￨O the￨O￨O british￨NATIONALITY￨O empire￨O￨O and￨O￨O migrations￨O￨O following￨O￨O the￨O￨O irish￨NATIONALITY￨O famine￨O￨O and￨O￨O highland￨O￨O clearances￨O￨O resulted￨O￨O in￨O￨O the￨O￨O distribution￨O￨O of￨O￨O the￨O￨O islands￨O￨O '￨O￨O population￨O￨O and￨O￨O culture￨O￨O throughout￨O￨O the￨O￨O world￨O￨O and￨O￨O a￨O￨O rapid￨O￨O de￨O￨O -￨O￨O population￨O￨O of￨O￨O ireland￨COUNTRY￨O in￨O￨O the￨DATE￨O second￨DATE￨O half￨DATE￨O of￨DATE￨O the￨DATE￨O 19th￨DATE￨O century￨DATE￨O .￨O￨O most￨O￨O of￨O￨O ireland￨COUNTRY￨O seceded￨O￨O from￨O￨O the￨O￨O united￨COUNTRY￨O kingdom￨COUNTRY￨O after￨O￨O the￨O￨O irish￨NATIONALITY￨O war￨CAUSE_OF_DEATH￨O of￨O￨O independence￨O￨O and￨O￨O the￨O￨O subsequent￨O￨O anglo￨O￨O -￨O￨O irish￨NATIONALITY￨O treaty￨O￨O (￨O￨O 1919￨DATE￨O –￨O￨O 1922￨DATE￨O )￨O￨O ,￨O￨O with￨O￨O six￨NUMBER￨O counties￨O￨O remaining￨O￨O in￨O￨O the￨O￨O uk￨COUNTRY￨O as￨O￨O northern￨O￨O ireland￨COUNTRY￨O .￨O￨O when￨O￨O did￨O￨O the￨O￨O pictish￨O￨O tribe￨O￨O start￨O￨O to￨O￨O inhabit￨O￨O the￨O￨O islands￨O￨O ?￨O￨O the￨O￨O greco￨O￨O -￨O￨O egyptian￨NATIONALITY￨O scientist￨TITLE￨O claudius￨O￨O ptolemy￨O￨O referred￨O￨O to￨O￨O the￨O￨O larger￨O￨O island￨O￨O as￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O (￨O￨O μεγαλης￨O￨O βρεττανιας￨O￨O -￨O￨O megalis￨O￨O brettanias￨O￨O )￨O￨O and￨O￨O to￨O￨O ireland￨COUNTRY￨O as￨O￨O little￨O￨O britain￨COUNTRY￨O (￨O￨O μικρης￨O￨O βρεττανιας￨O￨O -￨O￨O mikris￨O￨O brettanias￨O￨O )￨O￨O in￨O￨O his￨O￨O work￨O￨O almagest￨O￨O (￨O￨O 147￨NUMBER￨O –￨O￨O 148￨DATE￨O ad￨DATE￨O )￨O￨O .￨O￨O in￨O￨O his￨O￨O later￨O￨O work￨O￨O ,￨O￨O geography￨O￨O (￨O￨O c￨O￨O .￨O￨O 150￨DATE￨O ad￨DATE￨O )￨O￨O ,￨O￨O he￨O￨O gave￨O￨O these￨O￨O islands￨O￨O the￨O￨O names￨O￨O alwion￨O￨B ,￨O￨I iwernia￨O￨I ,￨O￨I and￨O￨I mona￨O￨I (￨O￨O the￨O￨O isle￨O￨O of￨O￨O man￨O￨O )￨O￨O ,￨O￨O suggesting￨O￨O these￨O￨O may￨O￨O have￨O￨O been￨O￨O names￨O￨O of￨O￨O the￨O￨O individual￨O￨O islands￨O￨O not￨O￨O known￨O￨O to￨O￨O him￨O￨O at￨O￨O the￨O￨O time￨DATE￨O of￨O￨O writing￨O￨O almagest￨O￨O .￨O￨O the￨O￨O name￨O￨O albion￨COUNTRY￨O appears￨O￨O to￨O￨O have￨O￨O fallen￨O￨O out￨O￨O of￨O￨O use￨O￨O sometime￨O￨O after￨O￨O the￨O￨O roman￨MISC￨O conquest￨O￨O of￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O ,￨O￨O after￨O￨O which￨O￨O britain￨COUNTRY￨O became￨O￨O the￨O￨O more￨O￨O commonplace￨O￨O name￨O￨O for￨O￨O the￨O￨O island￨O￨O called￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O .￨O￨O in￨O￨O later￨O￨O writings￨O￨O ,￨O￨O what￨O￨O did￨O￨O claudius￨O￨O ptolemy￨O￨O called￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O ?￨O￨O the￨O￨O earliest￨O￨O known￨O￨O use￨O￨O of￨O￨O the￨O￨O phrase￨O￨O brytish￨O￨O iles￨O￨O in￨O￨O the￨O￨O english￨NATIONALITY￨O language￨O￨O is￨O￨O dated￨O￨O 1577￨DATE￨B in￨O￨O a￨O￨O work￨O￨O by￨O￨O john￨O￨O dee￨O￨O .￨O￨O today￨DATE￨O ,￨O￨O this￨O￨O name￨O￨O is￨O￨O seen￨O￨O by￨O￨O some￨O￨O as￨O￨O carrying￨O￨O imperialist￨O￨O overtones￨O￨O although￨O￨O it￨O￨O is￨O￨O still￨O￨O commonly￨O￨O used￨O￨O .￨O￨O other￨O￨O names￨O￨O used￨O￨O to￨O￨O describe￨O￨O the￨O￨O islands￨O￨O include￨O￨O the￨O￨O anglo￨O￨O -￨O￨O celtic￨O￨O isles￨O￨O ,￨O￨O atlantic￨O￨O archipelago￨O￨O ,￨O￨O british￨NATIONALITY￨O -￨O￨O irish￨NATIONALITY￨O isles￨O￨O ,￨O￨O britain￨COUNTRY￨O and￨O￨O ireland￨COUNTRY￨O ,￨O￨O uk￨COUNTRY￨O and￨O￨O ireland￨COUNTRY￨O ,￨O￨O and￨O￨O british￨NATIONALITY￨O isles￨O￨O and￨O￨O ireland￨COUNTRY￨O .￨O￨O owing￨O￨O to￨O￨O political￨O￨O and￨O￨O national￨O￨O associations￨O￨O with￨O￨O the￨O￨O word￨O￨O british￨NATIONALITY￨O ,￨O￨O the￨O￨O government￨O￨O of￨O￨O ireland￨COUNTRY￨O does￨O￨O not￨O￨O use￨O￨O the￨O￨O term￨O￨O british￨NATIONALITY￨O isles￨O￨O and￨O￨O in￨O￨O documents￨O￨O drawn￨O￨O up￨O￨O jointly￨O￨O between￨O￨O the￨O￨O british￨NATIONALITY￨O and￨O￨O irish￨NATIONALITY￨O governments￨O￨O ,￨O￨O the￨O￨O archipelago￨O￨O is￨O￨O referred￨O￨O to￨O￨O simply￨O￨O as￨O￨O "￨O￨O these￨O￨O islands￨O￨O "￨O￨O .￨O￨O nonetheless￨O￨O ,￨O￨O british￨NATIONALITY￨O isles￨O￨O is￨O￨O still￨O￨O the￨O￨O most￨O￨O widely￨O￨O accepted￨O￨O term￨O￨O for￨O￨O the￨O￨O archipelago￨O￨O .￨O￨O when￨O￨O was￨O￨O this￨O￨O brytish￨O￨O illes￨O￨O name￨O￨O used￨O￨O in￨O￨O the￨O￨O english￨NATIONALITY￨O language￨O￨O by￨O￨O john￨O￨O dee￨O￨O ?￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O lie￨O￨O at￨O￨O the￨O￨O juncture￨O￨O of￨O￨O several￨O￨O regions￨O￨O with￨O￨O past￨DATE￨O episodes￨O￨O of￨O￨O tectonic￨O￨O mountain￨O￨O building￨O￨O .￨O￨O these￨O￨O orogenic￨O￨O belts￨O￨O form￨O￨O a￨O￨O complex￨O￨O geology￨O￨O that￨O￨O records￨O￨O a￨O￨O huge￨O￨O and￨O￨O varied￨O￨O span￨O￨O of￨O￨O earth￨O￨O 's￨O￨O history￨O￨O .￨O￨O of￨O￨O particular￨O￨O note￨O￨O was￨O￨O the￨O￨O caledonian￨O￨O orogeny￨O￨O during￨O￨O the￨O￨O ordovician￨MISC￨O period￨O￨O ,￨O￨O c￨O￨O .￨O￨O 488￨NUMBER￨O –￨O￨O 444￨NUMBER￨O ma￨O￨O and￨O￨O early￨O￨O silurian￨O￨O period￨O￨O ,￨O￨O when￨O￨O the￨O￨O craton￨O￨O baltica￨O￨O collided￨O￨O with￨O￨O the￨O￨O terrane￨O￨O avalonia￨O￨O to￨O￨O form￨O￨O the￨O￨O mountains￨O￨B and￨O￨I hills￨O￨I in￨O￨O northern￨O￨O britain￨COUNTRY￨O and￨O￨O ireland￨COUNTRY￨O .￨O￨O baltica￨O￨O formed￨O￨O roughly￨O￨O the￨O￨O northwestern￨O￨O half￨O￨O of￨O￨O ireland￨COUNTRY￨O and￨O￨O scotland￨COUNTRY￨O .￨O￨O further￨O￨O collisions￨O￨O caused￨O￨O the￨O￨O variscan￨O￨O orogeny￨O￨O in￨O￨O the￨O￨O devonian￨O￨O and￨O￨O carboniferous￨O￨O periods￨O￨O ,￨O￨O forming￨O￨O the￨O￨O hills￨O￨O of￨O￨O munster￨O￨O ,￨O￨O southwest￨O￨O england￨COUNTRY￨O ,￨O￨O and￨O￨O southern￨O￨O wales￨O￨O .￨O￨O over￨O￨O the￨DURATION￨O last￨DURATION￨O 500￨DURATION￨O million￨DURATION￨O years￨DURATION￨O the￨O￨O land￨O￨O that￨O￨O forms￨O￨O the￨O￨O islands￨O￨O has￨O￨O drifted￨O￨O northwest￨O￨O from￨O￨O around￨O￨O 30￨NUMBER￨O °￨O￨O s￨O￨O ,￨O￨O crossing￨O￨O the￨O￨O equator￨O￨O around￨O￨O 370￨DATE￨O million￨DATE￨O years￨DATE￨O ago￨DATE￨O to￨O￨O reach￨O￨O its￨O￨O present￨DATE￨O northern￨O￨O latitude￨O￨O .￨O￨O what￨O￨O formed￨O￨O after￨O￨O the￨O￨O craton￨O￨O baltica￨O￨O and￨O￨O the￨O￨O terrane￨O￨O avalonia￨O￨O collision￨O￨O ?￨O￨O the￨O￨O islands￨O￨O are￨O￨O at￨O￨O relatively￨O￨O low￨O￨O altitudes￨O￨O ,￨O￨O with￨O￨O central￨O￨O ireland￨COUNTRY￨O and￨O￨O southern￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O particularly￨O￨O low￨O￨O lying￨O￨O :￨O￨O the￨O￨O lowest￨O￨O point￨O￨O in￨O￨O the￨O￨O islands￨O￨O is￨O￨O holme￨O￨O ,￨O￨O cambridgeshire￨STATE_OR_PROVINCE￨O at￨O￨O −￨O￨O 2￨NUMBER￨O .￨O￨O 75￨NUMBER￨O m￨O￨O (￨O￨O −￨O￨O 9￨NUMBER￨O .￨O￨O 02￨NUMBER￨O ft￨O￨O )￨O￨O .￨O￨O the￨O￨O scottish￨NATIONALITY￨O highlands￨O￨O in￨O￨O the￨O￨O northern￨O￨O part￨O￨O of￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O are￨O￨O mountainous￨O￨O ,￨O￨B with￨O￨I ben￨O￨O nevis￨O￨O being￨O￨O the￨O￨O highest￨O￨O point￨O￨O on￨O￨O the￨O￨O islands￨O￨O at￨O￨O 1￨NUMBER￨O ,￨O￨O 343￨NUMBER￨O m￨O￨O (￨O￨O 4￨NUMBER￨O ,￨O￨O 406￨NUMBER￨O ft￨O￨O )￨O￨O .￨O￨O other￨O￨O mountainous￨O￨O areas￨O￨O include￨O￨O wales￨O￨O and￨O￨O parts￨O￨O of￨O￨O ireland￨COUNTRY￨O ,￨O￨O however￨O￨O only￨O￨O seven￨NUMBER￨O peaks￨O￨O in￨O￨O these￨O￨O areas￨O￨O reach￨O￨O above￨O￨O 1￨NUMBER￨O ,￨O￨O 000￨NUMBER￨O m￨O￨O (￨O￨O 3￨NUMBER￨O ,￨O￨O 281￨NUMBER￨O ft￨O￨O )￨O￨O .￨O￨O lakes￨O￨O on￨O￨O the￨O￨O islands￨O￨O are￨O￨O generally￨O￨O not￨O￨O large￨O￨O ,￨O￨O although￨O￨O lough￨O￨O neagh￨O￨O in￨O￨O northern￨O￨O ireland￨COUNTRY￨O is￨O￨O an￨O￨O exception￨O￨O ,￨O￨O covering￨O￨O 150￨NUMBER￨O square￨O￨O miles￨O￨O (￨O￨O 390￨NUMBER￨O km2￨O￨O )￨O￨O .￨O￨O [￨O￨O citation￨O￨O needed￨O￨O ]￨O￨O the￨O￨O largest￨O￨O freshwater￨O￨O body￨O￨O in￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O (￨O￨O by￨O￨O area￨O￨O )￨O￨O is￨O￨O loch￨O￨O lomond￨O￨O at￨O￨O 27￨NUMBER￨O .￨O￨O 5￨NUMBER￨O square￨O￨O miles￨O￨O (￨O￨O 71￨NUMBER￨O km2￨O￨O )￨O￨O ,￨O￨O and￨O￨O loch￨O￨O ness￨O￨O ,￨O￨O by￨O￨O volume￨O￨O whilst￨O￨O loch￨O￨O morar￨O￨O is￨O￨O the￨O￨O deepest￨O￨O freshwater￨O￨O body￨O￨O in￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O ,￨O￨O with￨O￨O a￨O￨O maximum￨O￨O depth￨O￨O of￨O￨O 310￨NUMBER￨O m￨O￨O (￨O￨O 1￨NUMBER￨O ,￨O￨O 017￨NUMBER￨O ft￨O￨O )￨O￨O .￨O￨O there￨O￨O are￨O￨O a￨O￨O number￨O￨O of￨O￨O major￨O￨O rivers￨O￨O within￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O .￨O￨O the￨O￨O longest￨O￨O is￨O￨O the￨O￨O shannon￨O￨O in￨O￨O ireland￨COUNTRY￨O at￨O￨O 224￨NUMBER￨O mi￨O￨O (￨O￨O 360￨NUMBER￨O km￨O￨O )￨O￨O .￨O￨O [￨O￨O citation￨O￨O needed￨O￨O ]￨O￨O the￨O￨O river￨TITLE￨O severn￨O￨O at￨O￨O 220￨NUMBER￨O mi￨O￨O (￨O￨O 354￨NUMBER￨O km￨O￨O )￨O￨O [￨O￨O citation￨O￨O needed￨O￨O ]￨O￨O is￨O￨O the￨O￨O longest￨O￨O in￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O .￨O￨O the￨O￨O isles￨O￨O have￨O￨O a￨O￨O temperate￨O￨O marine￨TITLE￨O climate￨O￨O .￨O￨O the￨O￨O north￨O￨O atlantic￨O￨O drift￨O￨O (￨O￨O "￨O￨O gulf￨O￨O stream￨O￨O "￨O￨O )￨O￨O which￨O￨O flows￨O￨O from￨O￨O the￨O￨O gulf￨O￨O of￨O￨O mexico￨COUNTRY￨O brings￨O￨O with￨O￨O it￨O￨O significant￨O￨O moisture￨O￨O and￨O￨O raises￨O￨O temperatures￨O￨O 11￨NUMBER￨O °￨O￨O c￨O￨O (￨O￨O 20￨NUMBER￨O °￨O￨O f￨O￨O )￨O￨O above￨O￨O the￨O￨O global￨O￨O average￨O￨O for￨O￨O the￨O￨O islands￨O￨O '￨O￨O latitudes￨O￨O .￨O￨O winters￨SET￨O are￨O￨O cool￨O￨O and￨O￨O wet￨O￨O ,￨O￨O with￨O￨O summers￨SET￨O mild￨O￨O and￨O￨O also￨O￨O wet￨O￨O .￨O￨O most￨O￨O atlantic￨O￨O depressions￨O￨O pass￨O￨O to￨O￨O the￨O￨O north￨O￨O of￨O￨O the￨O￨O islands￨O￨O ,￨O￨O combined￨O￨O with￨O￨O the￨O￨O general￨TITLE￨O westerly￨O￨O circulation￨O￨O and￨O￨O interactions￨O￨O with￨O￨O the￨O￨O landmass￨O￨O ,￨O￨O this￨O￨O imposes￨O￨O an￨O￨O east￨O￨O -￨O￨O west￨O￨O variation￨O￨O in￨O￨O climate￨O￨O .￨O￨O where￨O￨O is￨O￨O the￨O￨O highest￨O￨O point￨O￨O in￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O ?￨O￨O the￨O￨O islands￨O￨O enjoy￨O￨O a￨O￨O mild￨O￨O climate￨O￨O and￨O￨O varied￨O￨O soils￨O￨O ,￨O￨O giving￨O￨O rise￨O￨O to￨O￨O a￨O￨O diverse￨O￨O pattern￨O￨O of￨O￨O vegetation￨O￨O .￨O￨O animal￨O￨O and￨O￨O plant￨O￨O life￨O￨O is￨O￨O similar￨O￨O to￨O￨O that￨O￨O of￨O￨O the￨O￨O northwestern￨O￨O european￨NATIONALITY￨O continent￨O￨O .￨O￨O there￨O￨O are￨O￨O however￨O￨O ,￨O￨O fewer￨O￨O numbers￨O￨O of￨O￨O species￨O￨O ,￨O￨O with￨O￨O ireland￨COUNTRY￨O having￨O￨O even￨O￨O less￨O￨O .￨O￨O all￨O￨O native￨O￨O flora￨O￨O and￨O￨O fauna￨O￨O in￨O￨O ireland￨COUNTRY￨O is￨O￨O made￨O￨O up￨O￨O of￨O￨O species￨O￨O that￨O￨O migrated￨O￨O from￨O￨O elsewhere￨O￨O in￨O￨O europe￨O￨O ,￨O￨O and￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O in￨O￨O particular￨O￨O .￨O￨O the￨O￨O only￨O￨O window￨O￨O when￨O￨O this￨O￨O could￨O￨O have￨O￨O occurred￨O￨O was￨O￨O between￨O￨O the￨O￨O end￨O￨O of￨O￨O the￨O￨O last￨O￨O ice￨O￨O age￨O￨O (￨O￨O about￨O￨B 12￨NUMBER￨I ,￨O￨I 000￨DATE￨I years￨DATE￨I ago￨DATE￨I )￨O￨O and￨O￨O when￨O￨O the￨O￨O land￨O￨O bridge￨O￨O connecting￨O￨O the￨O￨O two￨NUMBER￨O islands￨O￨O was￨O￨O flooded￨O￨O by￨O￨O sea￨O￨O (￨O￨O about￨O￨O 8￨NUMBER￨O ,￨O￨O 000￨DATE￨O years￨DATE￨O ago￨DATE￨O )￨O￨O .￨O￨O when￨O￨O did￨O￨O the￨O￨O last￨O￨O ice￨O￨O age￨O￨O end￨O￨O in￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O ?￨O￨O at￨O￨O the￨O￨O time￨DATE￨O of￨O￨O the￨O￨O roman￨MISC￨O empire￨O￨O ,￨O￨O about￨DATE￨O two￨DATE￨O thousand￨DATE￨O years￨DATE￨O ago￨DATE￨O ,￨O￨O various￨O￨O tribes￨O￨O ,￨O￨O which￨O￨O spoke￨O￨O celtic￨O￨B dialects￨O￨I of￨O￨I the￨O￨I insular￨O￨I celtic￨O￨I group￨O￨I ,￨O￨O were￨O￨O inhabiting￨O￨O the￨O￨O islands￨O￨O .￨O￨O the￨O￨O romans￨O￨O expanded￨O￨O their￨O￨O civilisation￨O￨O to￨O￨O control￨O￨O southern￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O but￨O￨O were￨O￨O impeded￨O￨O in￨O￨O advancing￨O￨O any￨O￨O further￨O￨O ,￨O￨O building￨O￨O hadrian￨O￨O 's￨O￨O wall￨O￨O to￨O￨O mark￨O￨O the￨O￨O northern￨O￨O frontier￨O￨O of￨O￨O their￨O￨O empire￨O￨O in￨O￨O 122￨DATE￨O ad￨DATE￨O .￨O￨O at￨O￨O that￨O￨O time￨O￨O ,￨O￨O ireland￨COUNTRY￨O was￨O￨O populated￨O￨O by￨O￨O a￨O￨O people￨O￨O known￨O￨O as￨O￨O hiberni￨O￨O ,￨O￨O the￨O￨O northern￨O￨O third￨ORDINAL￨O or￨O￨O so￨O￨O of￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O by￨O￨O a￨O￨O people￨O￨O known￨O￨O as￨O￨O picts￨O￨O and￨O￨O the￨O￨O southern￨O￨O two￨NUMBER￨O thirds￨O￨O by￨O￨O britons￨O￨O .￨O￨O the￨O￨O people￨O￨O that￨O￨O lived￨O￨O in￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O during￨O￨O the￨O￨O roman￨MISC￨O empire￨O￨O era￨O￨O spoke￨O￨O which￨O￨O language￨O￨O ?￨O￨O at￨O￨O the￨O￨O time￨DATE￨O of￨O￨O the￨O￨O roman￨MISC￨O empire￨O￨O ,￨O￨O about￨DATE￨O two￨DATE￨O thousand￨DATE￨O years￨DATE￨O ago￨DATE￨O ,￨O￨O various￨O￨O tribes￨O￨O ,￨O￨O which￨O￨O spoke￨O￨O celtic￨O￨O dialects￨O￨O of￨O￨O the￨O￨O insular￨O￨O celtic￨O￨O group￨O￨O ,￨O￨O were￨O￨O inhabiting￨O￨O the￨O￨O islands￨O￨O .￨O￨O the￨O￨O romans￨O￨O expanded￨O￨O their￨O￨O civilisation￨O￨O to￨O￨O control￨O￨O southern￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O but￨O￨O were￨O￨O impeded￨O￨O in￨O￨O advancing￨O￨O any￨O￨O further￨O￨O ,￨O￨O building￨O￨O hadrian￨O￨O 's￨O￨O wall￨O￨O to￨O￨O mark￨O￨O the￨O￨O northern￨O￨O frontier￨O￨O of￨O￨O their￨O￨O empire￨O￨O in￨O￨O 122￨DATE￨O ad￨DATE￨O .￨O￨O at￨O￨O that￨O￨O time￨O￨O ,￨O￨O ireland￨COUNTRY￨O was￨O￨O populated￨O￨O by￨O￨O a￨O￨O people￨O￨O known￨O￨O as￨O￨O hiberni￨O￨O ,￨O￨O the￨O￨O northern￨O￨O third￨ORDINAL￨O or￨O￨O so￨O￨O of￨O￨O great￨COUNTRY￨O britain￨COUNTRY￨O by￨O￨O a￨O￨O people￨O￨O known￨O￨O as￨O￨O picts￨O￨O and￨O￨O the￨O￨O southern￨O￨O two￨NUMBER￨O thirds￨O￨O by￨O￨O britons￨O￨B .￨O￨O what￨O￨O was￨O￨O the￨O￨O name￨O￨O of￨O￨O the￨O￨O native￨O￨O people￨O￨O that￨O￨O lived￨O￨O in￨O￨O the￨O￨O southern￨O￨O parts￨O￨O of￨O￨O the￨O￨O british￨NATIONALITY￨O isles￨O￨O during￨O￨O the￨O￨O roman￨MISC￨O empire￨O￨O occupation￨O￨O ?￨O|O

train-target.txt: what are the only teo islands belonging to the british isles ireland became an old reock in 12 , 000 bc but was n't inhabited until when ? the pictish tribe of southern ireland inhabited the islands when ? what islands were descovered in 150 ad ? john doe cites the earliest known use of brytish iles as occurring in what year ? what formed before the craton baltica and avalonia collision ? at 9 , 000 feet tall , what is the highest point on the islands ? the beginning of the last ice age occurred when ? during the briton empire , tribes spoke which dialect ? the northern third of ireland was inhabited by picts , while the southern two thirds by who ?

chens72 commented 3 years ago

When I used the following command for preprocessing, the problem was solved. onmt_preprocess \ -train_src data/Features/train-src-ans-features.txt \ -train_tgt data/Features/train-target.txt \ -valid_src data/Features/dev-src-ans-features.txt \ -valid_tgt data/Features/dev-target.txt \ -save_data data/Features/QGprocess-features \ -dynamic_dict \ -share_vocab \ -lower \ -overwrite \ -src_seq_length 3500 \ -tgt_seq_length 200

OpenNMT / OpenNMT-py

Preprocess：tgt_vocab_size is always four #1961