Closed yoonspark closed 1 day ago
Thanks a lot!
Regarding your questions:
PyMFBase.save()
method as I am using the SunTopic.save()
method pickle
was discouraged for security reasons (https://www.benfrederickson.com/dont-pickle-your-data/) is that incorrect?
The following are my suggestions to improve the source code. Please let me know if you have any questions.
Major Suggestions
[x] In
PyMFBase
, raiseNotImplementedError
for unimplemented methods (e.g.,_init_h()
) with informative message.[x]
setup_logging()
is duplicated inPyMFBase
andsuntopic
classes. I recommend refactoring it into a function in a separate module (e.g.,utils.py
).[x] Currently,
predict_Y_mse()
is defined insidesuntopic.hyperparam_cv()
, leading to a long code block. Refactorpredict_Y_mse()
into an internal method, i.e.,suntopic._predict_Y_mse()
, and call it insuntopic.hyperparam_cv()
usingself
.[x] Remove unnecessary code in
suntopic.py
:https://github.com/TillRS/SUN_TopicModel/blob/d6e3846af8a16241f4187c9b8dea27d268a50d30/src/sun_topicmodel/suntopic.py#L90
https://github.com/TillRS/SUN_TopicModel/blob/d6e3846af8a16241f4187c9b8dea27d268a50d30/src/sun_topicmodel/suntopic.py#L93
[x] Remove unnecessary
return
statements insuntopic.py
:https://github.com/TillRS/SUN_TopicModel/blob/d6e3846af8a16241f4187c9b8dea27d268a50d30/src/sun_topicmodel/suntopic.py#L245
https://github.com/TillRS/SUN_TopicModel/blob/d6e3846af8a16241f4187c9b8dea27d268a50d30/src/sun_topicmodel/suntopic.py#L442
https://github.com/TillRS/SUN_TopicModel/blob/d6e3846af8a16241f4187c9b8dea27d268a50d30/src/sun_topicmodel/suntopic.py#L481
Minor Suggestions
[x] To improve code readability, consider breaking long chained expressions into multiple shorter ones. For instance, the following line of code may look cryptic to a new user/contributor:
https://github.com/TillRS/SUN_TopicModel/blob/d6e3846af8a16241f4187c9b8dea27d268a50d30/src/sun_topicmodel/suntopic.py#L222
[x] You may use an f-string to reduce repetition of
print()
statements insuntopic.summary()
(andsuntopic.cv_summary()
), like so:[x] Consider making class attributes "internal" (i.e., prefixed with
_
) if they are not intended for user access (e.g.,suntopic._W
,suntopic._H
).[x] The
suntopic
class has a lot of attributes prefixed withcv_
, which are used to pass data between different methods. Given these are all values related to cross validation, I suggest using a single attribute (e.g.,cv_values
) to store all these values. Specifically, you can initialize it as an empty dictionary undersuntopic.__init__()
, where you can explicitly describe what it is for and what values it is supposed to contain. This will make the code easier to read and maintain.Questions
PyMFBase.save()
stores theferr
value but it is not retrieved inPyMFBase.load()
. Is this omission by design?In the
suntopic
class, the_model_pred
attribute is not accessed outside thepredict()
function. Is there a reason why it is stored and used as an attribute? Else, it would be better to instead use a local variable for this.Is there a reason why the code is using
numpy
rather thanpickle
for saving and loading models? Asking this becausepickle
can simplify the code for serializing and deserializing model objects.