Closed mabeckers closed 3 months ago
Hi @mabeckers, thanks for the contribution. It's great to see DMRL being added into Cornac. However, there are a few things that we might need to reconsider. First, each model in Cornac is very self-contained and model dependencies should not be added as global requirements. This is to minimize the maintenance effort for the core functions, also to facilitate a wider-range of model implementation. With that said, there are two directions to proceed with the DMRL model:
FeatureModality
for either text/video. We can consider if additional VideoModality
is needed or ImageModality
could be used as an alternative for data input.Hope that my explanation is clear enough. Happy to chat more.
Hi @tqtg. thanks for taking a look at my PR. Yeah I notice that the two new transformer modalities introduced more general dependencies. I can go ahead and move those modules inside of recom_dmrl, that way the model receives basic text and image as input. I will make it general so that in case one already has encoded features from somewhere (say it comes with the example) the model can take that in as well and will not run another layer of encoding on top of that feature set. Does that sound good to you?
Thanks, Max
sounds good to me. Let's do that and see how it goes.
Made the requested changes, please let me know if there's anything else I can change for this PR. Also remerged with the latest cornac master. Thanks!
@mabeckers I did some changes to make the tests work and also refactoring. Please have a look and see if they make sense to you.
Everything looks great to me!
Hey @mabeckers, there is something that we need to modify about the model input (text and image modalities). By design, we don't input the modalities directly to the model, but we input them to an evaluation method (e.g., RatioSplit). The reason is that the modalities will be aligned with user/item data splitting and user/item ID being mapped properly. Taking CDL model as an example, we input text modality to the RatioSplit eval method (here) and we can access the text modality inside the model implementation via the train_set (here). Can we work on this last change before we merge the model into Cornac?
Hey @tqtg Yeah I understand that's how the cornac framework works with modalities, which is why until commit 9fc96b3 I had it that way and was feeding modalities from the outside to the RatioSplit Instance. I only changed it and moved them inside the model because you mentioned you didn't want to add any general dependencies (such as new TransformerModalities) to the cornac core but move that into the DMRL folder and have the model accept raw text and images. That's why I moved it out of RatioSplit. I am happy to reverse the commit back to the earlier version and introduce TransformerVisionModadality and TransformerTextModality as new general modality encoders. I can of course also just keep them in the DMRL folder and still use them as normal modalities and input to the RatioSplit instance. Just let me know which way you would prefer it. Thanks!
My point is that you can reuse TextModality and ImageModality to hold the image/text corpus and input them into RatioSplit to perform data splitting. The only part we want to move inside model implementation is where we use Transformers to encode the raw data. Does that make sense to you?
Ok I see. So if I am understanding you correctly you would want the example file running the DMRL example to look something like this?:
"""Example for Disentangled Multimodal Recommendation, with only feedback and textual modality. For an example including image modality please see dmrl_clothes_example.py"""
import cornac from cornac.data import Reader from cornac.datasets import citeulike from cornac.eval_methods import RatioSplit from cornac.models.dmrl.recom_dmrl import TextModalityInput
docs, item_id_ordering_text = citeulike.load_text() feedback = citeulike.load_feedback(reader=Reader(item_set=item_id_ordering_text))
text_modality_input = TextModalityInput(item_id_ordering_text, docs)
dmrl_recommender = cornac.models.dmrl.DMRL(
batch_size=4096,
epochs=20,
log_metrics=False,
learning_rate=0.01,
num_factors=2,
decay_r=0.5,
decay_c=0.01,
num_neg=3,
embedding_dim=100,
text_features=text_modality_input)
item_text_modality = dmrl_recommender.encode_text() # returns a generic feature modality (or even a TextModality) # where pre-encoded text is given in .features attribute and uses Transformer internally.
ratio_split = RatioSplit( data=feedback, test_size=0.2, exclude_unknowns=True, verbose=True, seed=123, rating_threshold=0.5, item_text = item_text_modality )
rec_300 = cornac.metrics.Recall(k=300) prec_30 = cornac.metrics.Precision(k=30)
cornac.Experiment(eval_method=ratio_split, models=[dmrl_recommender], metrics=[prec_30, rec_300]).run()
@mabeckers I made some changes to illustrate my idea. Please have a look and let me know if they make sense to you. We can further refactor the code to remove some unused parts.
@tqtg Had to set preencode=True (that means to be pre-encoded as part of TransformersModality init, preencoded means it's already pre-encoded from outside), but other than that looks very good! I understand now what you meant. We use TextModality for data splitting and id mapping on outside and "overwrite" it on the inside with TransformerModalities. Just running some final checks then will commit! Thanks for showing me this way of doing it. Only downside here is that we call vectorizer.fit_transform(self.corpus) in _build_text() of the TextModality when all we want is _swap_text() ... so a little overhead but I'm fine doing it that way :)
@mabeckers OK, I though it should be encoded batch by batch during training thus preencode=False
. Anw, please help check because I might misinterpret your implementation.
For the basic text transformation overhead, I'm aware of that and it might be an issue with a big text corpus. I was thinking of using tokenizer as the indicator whether we want to do any transformation or not. If the tokenizer is not provided during the initialization of the TextModality, we just bypass the _build_text()
call. What do you think?
yeah the tokenizer is a good idea. That would make sense. The TransformerModalities work both in batch as well as pre-encoding. Just using them as pre-encoding in my examples bc it makes runtime a lot faster. I just pushed my final changes and checks. Thanks
yeah the tokenizer is a good idea. That would make sense. The TransformerModalities work both in batch as well as pre-encoding. Just using them as pre-encoding in my examples bc it makes runtime a lot faster. I just pushed my final changes and checks. Thanks
Cool! This PR looks good to me. Let's merge it and have another one to update the TextModality. Thanks @mabeckers!
Description
I added Disentangled Multimodal Representation Learning (https://arxiv.org/pdf/2203.05406.pdf) as the DMRL model to cornac. In the context of this addition I had to:
Checklist:
README.md
(if you are adding a new model).examples/README.md
(if you are adding a new example).datasets/README.md
(if you are adding a new dataset).