docarray / docarray

Represent, send, store and search multimodal data
https://docs.docarray.org/
Apache License 2.0
2.95k stars 234 forks source link

“Multimodal deep learning with DocArray” There are many errors on the page #1889

Open Kodp opened 4 months ago

Kodp commented 4 months ago

Initial Checks

Description

I couldn't find anywhere else to report document errors. https://docs.docarray.org/how_to/multimodal_training_and_serving/ Defined

class PairTextImage(BaseDoc):
    text: TextDoc
    image: ImageDoc

and then use it

  da = DocList[PairTextImage](
        PairTextImage(text=Text(text=i.caption), image=Image(url=f"Images/{i.image}"))
        for i in df.itertuples()
    )

actually the Image class is not defined.

   def __call__(self, text: Text) -> None:
        assert isinstance(text, Text)
        text.tokens = Tokens(
            **self.tokenizer(
                text.text, padding="max_length", truncation=True, max_length=48
            )

and the Text class is not defined.

Example Code

No response

Python, DocArray & OS Version

I couldn't find anywhere else to report document errors

Affected Components

Kodp commented 4 months ago
from docarray.data import MultiModalDataset

dataset = MultiModalDataset[PairTextImage](da=da, preprocessing=preprocessing)
loader = DataLoader(
    dataset,
    batch_size=128,
    collate_fn=dataset.collate_fn,
    shuffle=True,
    num_workers=4,
    multiprocessing_context="fork",
)

The parameter in this section is wrong, there is no da parameter. TypeError: MultiModalDataset.__init__() got an unexpected keyword argument 'da'

JoanFM commented 4 months ago

It seems is docs thr parameter and not da. Would u be able to do a PR correcting the documentation?