Shark-ML / Shark

The Shark Machine Leaning Library. See more:
http://shark-ml.github.io/Shark/
GNU Lesser General Public License v3.0
504 stars 131 forks source link

[SharedContainer::splitBlock] Container is not Independent #259

Closed axiqia closed 5 years ago

axiqia commented 5 years ago

`normalizing_train.train(normalizer_train, training_data.inputs())'

this should fix your issue. Please remember not to train a normalizer on the test set and use

normalizedData_test = transformInputs(test_data, normalizer_train);

Originally posted by @Ulfgard in https://github.com/Shark-ML/Shark/issues/51#issuecomment-189819470

RegressionDataset raw_data;
importCSV(raw_data, argv[1], FIRST_COLUMN, 1, ' ');
raw_data.shuffle();     
bool removleMean = true;
Normalizer<RealVector> normalizer;
NormalizeComponentsUnitVariance<RealVector> normalizingTrainer(removleMean);
normalizingTrainer.train(normalizer, raw_data.inputs());
RegressionDataset data = transformInputs(raw_data, normalizer);

 //Split the dataset into a training and a test dataset
RegressionDataset   dataTest = splitAtElement(data, static_cast<std::size_t>(0.8*data.numberOfElements()));

After I normalized the raw data, I got an runtime error when I splitted the data into dataTest.

terminate called after throwing an instance of 'shark::Exception'
  what():  [SharedContainer::splitBlock] Container is not Independent
[1]    1440 abort (core dumped)  ./ExampleProject  actfwd.model
Ulfgard commented 5 years ago

arrgh! This might actually be a bug- I think transformInputs just copies the label container from raw_data, therefore the labels are still shared with raw_data.

could you try whether the following before splitting helps?

data.labels().makeIndependent()

I have to ask myself whether the independence check is doing more harm than good here.

axiqia commented 5 years ago

It works :)

axiqia commented 5 years ago

Dear @Ulfgard , Is there a API to transform a RealVector using a Normalizer? e.g.

      //load scaler from the file
      ifstream ifs2(argv[3]);
      TextInArchive ia2(ifs2);
      Normalizer<RealVector> normalizer2;
      normalizer2.load(ia2, 0);
      ifs2.close();

      double param[] = { 
      241.28,1, 1, 0, 1, 0, 1, 1, 1, 14745600, 14745600, 14745600, 14745600, 1, 0 ,0 ,1, 1, 1, 14745600, 14745600, 14745600, 14745600, 1
      };  
      RealVector onetest(23);
      std::copy(param, param+23, onetest.begin());
      RealVector points = transform(onetest, normalizer2);    

I only find shark::transform (Data< T > const &data, Functor f) here . Thank you :)

Ulfgard commented 5 years ago

normalizer is a model, so

normalizer2(onetest)

it would also work for Data as argument if you happen to have many data points

axiqia commented 5 years ago

normalizer is a model, so

normalizer2(onetest)

it would also work for Data as argument if you happen to have many data points

So I can also use normalizer2(data.inputs()) and the result is the same astransformInputs(data.inputs())

Do I understand correctly?

Ulfgard commented 5 years ago

Yes,

this will internally just call transform, so you are fine. This is pure convenience


From: axiqia [notifications@github.com] Sent: Wednesday, October 24, 2018 4:09 PM To: Shark-ML/Shark Cc: Oswin Krause; Mention Subject: Re: [Shark-ML/Shark] [SharedContainer::splitBlock] Container is not Independent (#259)

normalizer is a model, so

normalizer2(onetest)

it would also work for Data as argument if you happen to have many data points

So I can also use normalizer2(data.inputs()) and the result is the same as transformInputs(data.inputs())

Do I understand correctly?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Shark-ML/Shark/issues/259#issuecomment-432673040, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AOWTBhtLqO9cuHoJGnSSR00MTsEWBs7Aks5uoHSegaJpZM4X3qWX.

Ulfgard commented 5 years ago

exactly