Open sschmidt23 opened 2 months ago
Hi,
I am trying to run with this option set. I am getting a dimension error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[119], line 1
----> 1 knn_estimated = estimate_knn.estimate(test_data_handle)
File ~/Documents/github/lincc/rail_base/src/rail/estimation/estimator.py:97, in CatEstimator.estimate(self, input_data)
73 """The main interface method for the photo-z estimation
74
75 This will attach the input data (defined in ``inputs`` as "input") to this
(...)
94 Handle providing access to QP ensemble with output data
95 """
96 self.set_data("input", input_data)
---> 97 self.run()
98 self.finalize()
99 return self.get_handle("output")
File ~/Documents/github/lincc/rail_base/src/rail/estimation/estimator.py:110, in CatEstimator.run(self)
108 for s, e, test_data in iterator:
109 print(f"Process {self.rank} running estimator on chunk {s} - {e}")
--> 110 self._process_chunk(s, e, test_data, first)
111 first = False
112 # Running garbage collection manually seems to be needed
113 # to avoid memory growth for some estimators
File ~/Documents/github/lincc/rail_sklearn/src/rail/estimation/algos/k_nearneigh.py:195, in KNearNeighEstimator._process_chunk(self, start, end, data, first)
192 knn_df.loc[np.isclose(knn_df[col], self.config.nondetect_val), col] = np.float32(self.config.mag_limits[col])
194 testcolordata = _computecolordata(knn_df, self.config.ref_band, self.config.bands, self.config.only_colors)
--> 195 dists, idxs = self.kdtree.query(testcolordata, k=self.numneigh)
196 dists += TEENY
197 test_ens = _makepdf(dists, idxs, self.trainszs, self.sigma)
File sklearn/neighbors/_binary_tree.pxi:1183, in sklearn.neighbors._kd_tree.BinaryTree64.query()
ValueError: query data dimension must match training data dimension
Do I need to change the input data handle after setting only_colors=True
in the informer?
I think I set things up so that you had to set the config parameter only_colors=True
in both the inform and estimate stage, is it possible that you only set it in one of the two, and thus the model expects a different number of parameters? I could store the boolean for only_colors
in the model to ensure that a user can't accidentally use differing values for inform and estimate, if that is less confusing.
If it's not that, then this error could also maybe be caused if you are using a non-default set of names for the magnitudes via bands
and do not also supply the dictionary of mag_limits
for those bands.
Ah yes thanks, that fixed it setting it True in the estimate stage too.
It probably makes more sense to store the only_colors value in the model, since a user shouldn't really be able to set it in the estimate stage if the model was trained for a specific case. I'll make a PR to do just that and merge in so that you only specify in the inform.
During Raphael Shirley's presentation on rail_lephare, Jeff mentioned that with very disparate mag distributions for training and test, the inclusion of a magnitude in the K nearest neighbor distance calculation could actually make things worse than just using colors without the magnitude. Adding an option to allow running with just colors would let people test this, should be very easy to implement.
Before submitting Please check the following: