Closed 10zinten closed 1 year ago
I'm not sure about the first idea, I think you're wanting to make OpenPechaFS
a very high level class, but I think it's not a good idea. I think we need a class that interacts with a local .opf
folder and that's it, I think the current state of OpenPechaFS
is just that so I wouldn't change it.
Now, I also agree we need a high-level class that reserves an ID, create the repo on github, downloads it, etc. and that's OpenPechaGitRepo. So I would instead do:
pecha = OpenPechaFS(<path_to_pecha>) # no change
pecha = OpenPechaGitRepo(base_path, <pecha_id>=None, <rev>="main") # downloads pecha from github if it exists, otherwise create one with the id. If pecha_id is None, reserves a new id
Note that if we add the rev
argument, the path on disk should be changed to self.base_path / self.pecha_id + "_" + self.rev / self.pecha_id + ".opf"
. Otherwise you're going to get bugs when opening one pecha at two different revisions (since they'll share the same folder). I think it's a good thing overall, good idea!
Also, I think the creation of a new openpecha id should not be hidden deep in the Metadata
generation, it should be a static method of OpenPechaGitRepo
that anyone can use:
pecha_id = OpenPechaGitRepo.reserve_new_id() # reserves a new id that is not currently in use on github
Note that in ocr_formatter you can then use OpenPechaGitRepo.reserve_new_id()
to pass the pecha_id
argument to create_opf, I think that's a small thing you can do that should solve your current issue
I think having output_path
in the __init__
of OCRFormatter
is not ideal, it would be better to have opf_path
as an argument of create_opf
. That way the OCRFormatter
doesn't need to deal with creating a new ID, cloning github repos, etc. which I think shouldn't be its job, it's just there to write some files in an opf.
And finally, I think it would be much cleaner if we had a method create_in_op
method in OCRFormatter
. That way everything is abstracted correctly, we already have our OpenPecha
object in the function, it can be of any subclass of OpenPecha
, that's the cleanest solution.
What do you think?
it would be better to have
opf_path
as an argument ofcreate_opf
. That way the OCRFormatter doesn't need to deal with creating a new ID.
so, that means, the caller/client code using the Formatter
has to make prepare opf_path
?
How it's leaky abstraction?
output_path
is the path to store all the pechas created by the formatters. It defaults to~/.openpecha/pechas/
But in OCRFormatter's
create_opf
, theoutput_path
is passed aspath
ofOpenPecha
which isopf_path
now, the caller code, eg
OCR-pipelines
, needs to createopf_path
for pecha, which in turn requires to createpecha_id
. But thepecha_id
generation is handled byMetadata
, which is only created in theFormatters
. So, with currently implementation, pecha will be saved atopf_path
created by caller code. Since, it's doesn't have access toMetadata
creation, metadata will generate new pecha_id. Now, we ended up with different,pecha_id
inopf_path
and `meta.yml.Therefore, I think this is a leaky abstraction. The caller code only needs to provide where to store the all the pechas to the Formatters.
Why this problem exists?
I think, there is two scenarios,
creating
andloading
pecha and we don't have clean way to handle these two.Solution
1. When
Creating
new pechasWe should initialise the
pecha
object with the actual data like,base
,layers
,metadata
, etc. When saving, we should provideoutput_path
which is the parent path of the pecha path like{output_path}/{pecha_id}/{pecha_id}.opf
. Now theoutput_path
is configurable, which is desired behaviour.2. When
Loading
existing pechasI think we should go with
classmethods
, for eg: