IBM / unitxt

🦄 Unitxt: a python library for getting data fired up and set for training and evaluation
https://unitxt.rtfd.io
Apache License 2.0
139 stars 29 forks source link

Add safe and well maintained operator for running external code from web sources #954

Open elronbandel opened 1 week ago

elronbandel commented 1 week ago

prototype:

class WebModuleOperator(FieldOperator):
    source_urls: List[str] # all downloaded to tempdir
    function: str # imported from tempdir when set
    hash: str # used to make sure the files hasnt changed

    def prepare(self):
        # here come your logic to set things up

    def process(self, value):
        return self._loaded_func(value)
yoavkatz commented 1 week ago

I think it should be called ApplyRemoteCode - and have as similar API to Apply.

ApplyRemoteCode("a", function="koko", to_field="b", source_urls = ['https:... '], source_urls_content_hash="XXXXX")