OpenSemanticLab / osw-python

GNU Affero General Public License v3.0
3 stars 2 forks source link

enh: Convenience wrapper for file download/upload #50

Open MatPoppFHG opened 6 months ago

MatPoppFHG commented 6 months ago

Currently a simple file-download takes 20-30 lines of code. This should be reduced to one line. Suggestion:


import osw
from osw.express import file_download

file: osw.model.LocalFile = file_download("https://wiki-dev.open-semantic-lab.org/w/img_auth.php/3/3b/OSWe7d2664d26334d99a10093557bba77d5.txt")

df = pd.load_csv(file.raw)

What should happen in background:

suggestion for funciton definition:


def file_download(file: str , credentials_path = "usual/path/to/local/model", download_cache = "", osw_domain = None, return_binary = 
False) -> Union[path, bytesio]:
  """
  convenience function to quickly download a file from osw via a url that can conveniently be copied from the browser url line
  """
  if osw_domain is not None:
    ### prepend domain to file , else file must contain valid domain + osw id
LukasGold commented 6 months ago

What we would still need or is currently problematic: the reload of the model.entity classes generated by the controller imports leads to errors when serializing objects because the model loaded at the top by importing osw dont match the models imported by the reload (not pointing to the same memory entry).

Option a) Therefore, in my opinion, just as the controllers are part of osw-python, the (File) models should also be part of it. This is scalable and can be transferred now and in future to other schemata. Idea for implementation: Before overwriting and re-importing entity.py, check whether it would result in any changes at all. Can this be implemented at code generator level?

Option b) Alternatively, the models could be persisted and not be replaced if already loaded. This would require maintenance of osw-python parallel to the PagePackages

SimonStier commented 6 months ago

The actual file download can be condensed to 2 lines (see https://github.com/OpenSemanticLab/osw-python/blob/a30bb5e248bebcb75ec6e2c15f850f87b0f2b5cd/examples/file_upload_download.py#L60C1-L63C18)

wf2 = osw_obj.load_entity("File:OSWe7d2664d26334d99a10093557bba77d5.txt").cast(WikiFileController, osw=osw_obj) 
LocalFileController(path="dummy2.txt").put_from(wf2)

(the first line may look bloated, by this approach will be consistent for any controller, inkl. DeviceController, DbController, etc.) So I guess the discussion is more about the osw initialization (login, schema download) and maybe a wrapper for the two lines above

LukasGold commented 6 months ago

@MatPoppFHG A first draft is ready for review at: https://github.com/OpenSemanticLab/osw-python/compare/main...50-enh-osw-express