What's the dataset used for training opus-mt-en-de

I'm trying to get the dataset used for training opus-mt-en-de for my own model training. With opus_get --source en --target de --list, I found the dataset is very big and contains many submodules:

458 KB https://object.pouta.csc.fi/OPUS-Books/v1/xml/de-en.xml.gz
  13 MB https://object.pouta.csc.fi/OPUS-Books/v1/xml/de.zip
  71 MB https://object.pouta.csc.fi/OPUS-Books/v1/xml/en.zip
 881 MB https://object.pouta.csc.fi/OPUS-CCAligned/v1/xml/de-en.xml.gz
  10 GB https://object.pouta.csc.fi/OPUS-CCAligned/v1/xml/de.zip
  87 GB https://object.pouta.csc.fi/OPUS-CCAligned/v1/xml/en.zip
  ......
 586 KB https://object.pouta.csc.fi/OPUS-bible-uedin/v1/xml/de-en.xml.gz
   7 MB https://object.pouta.csc.fi/OPUS-bible-uedin/v1/xml/de.zip
  14 MB https://object.pouta.csc.fi/OPUS-bible-uedin/v1/xml/en.zip
   1 MB https://object.pouta.csc.fi/OPUS-wikimedia/v20210402/xml/de-en.xml.gz
  65 MB https://object.pouta.csc.fi/OPUS-wikimedia/v20210402/xml/de.zip
   2 GB https://object.pouta.csc.fi/OPUS-wikimedia/v20210402/xml/en.zip

 552 GB Total size

Did you use all the 552GB data for training? Or only one or several submodules?

Many thanks!!

Helsinki-NLP / OPUS-MT-train

What's the dataset used for training opus-mt-en-de #74