Closed chovanecm closed 6 years ago
This would be awesome, especially for the local backends like tinydb or file-storage
I think the first step would be to decide upon a web api. Ideally, I think this would somehow be codified in the form of an abstract python class which would then be subclassed by each database backend.
Any thoughts? I'm happy to take a look through the current implementation and propose an interface, but I think it might be more appropriate for main contributors to do this kind of "deep design" thinking.
My idea was to make either an abstract class that would the PyMongoDataAccess inherit from (if that makes any sense in a language like Python where we can do "duck-typing" - but perhaps it does for the sake of intelligibility).
The data format accepted and returned by PyMongoDataAccess
would be then considered as a reference for other implementations - which is basically what @gideonite suggested.
I've tried to document the format of parameters requested by the MongoDB backend and I should do the same with return values.
Unfortunately, I've prioritised other features and aspects of the project (such as decoupling of the HTML view from its model to support its testability), so I've "postponed" this - not mentioning that new features like filtering (currently in the development branch) could be more difficult to implement with backends like file-storage.
Of course, any help would be appreciated. Basically, the important methods of the current interface are
def get_runs(self, sort_by=None, sort_direction=None,
start=0, limit=None, query={"type": "and", "filters": []}):
def get_run(self, run_id):
def connect(self): # not as important for other implementations - only referenced in bootstrap.py
The approach Sacredboard currently uses is to initialize the data gateway and put it into the current_app.config["data"]
variable in bootstrap.py
It's definitely not the best solution, but might be enough for now.
If anyone creates a proof-of-concept for other backends using the here provided hints, I'm happy to consult it, but cannot start developing it on my own now because of my other duties.
I was also thinking whether we might not try replacing (or extending?) the current pseudo-API with something nicer, such as GraphQL. But would be just a nice-to-have.
I've implemented a basic interface called DataStorage and PyMongoDataAccess inherits from it. I've also started implementing a FileStorage class which simply does get_run
and get_runs
. I'm getting some weird DataTables errors though when I startup the web app. I assume this is because either:
1 - Because the data format doesn't jive with the mongodb format, or 2 - I haven't implemented any filtering.
To create a run I simply load the run.json
and add set run["config"] = load_config("config.json")
.
At pointers at this point? Obviously it would be nice to get it to run. I'm also a noob when it comes to writing pythonic code so will happily take code style comments.
Here's the commit on my fork: https://github.com/gideonite/sacredboard/commit/8cfac39ef97afbd9debf2e0242ab0a14c2ebe073
Thanks! You can run sacredboard with debugging enabled (--debug) and navigate to 127.0.0.1:5000/api/runs There you can see that the returned object has to implement the "count" method to see the total number of records (DataTables uses this for pagination etc).
One of the solutions to achieve it is to add this to datastorage.py:
class DataIterator():
def __init__(self, count, iterable):
self.iterable = iterable
self._count = count
def count(self):
"The number of items"
return self._count
def __iter__(self):
return iter(self.iterable)
and to modify your get_runs method:
def get_runs(self, sort_by=None, sort_direction=None,
start=0, limit=None, query={"type": "and", "filters": []}):
all_run_ids = os.listdir(self.path_to_dir)
blacklist = set(["_sources"])
def run_iterator():
for id in all_run_ids:
if id in blacklist:
continue
yield self.get_run(id)
return DataIterator(len([id for id in all_run_ids if id not in blacklist]), run_iterator())
But even then there is a problem as the loaded JSON does not contain information about data types. Therefore, date and times, such as the experiment start time and last heartbeat time, are being loaded as strings and they need to be converted to appropriate Python representation using
datetime.datetime.strptime('2017-05-28T12:18:33.091144', '%Y-%m-%dT%H:%M:%S.%f')
I hope this should be enough for making it work (without sorting, pagination and filtering).
Regarding coding style, if you have the tox package installed, you can run tox -e flake8 It's quite pedantic, so there is still a lot of work to do even in my code. When everything is fine (line breaks, spaces etc) it starts checking the documentation strings.
What I personally found useful is to write tests together with the main code. You can find inspiration in test_mongodb.py (It uses a library that mocks MongoDB, so the test behaves almost as if it was talking to a real database).
Great feedback! Sorry for the delay in responding.
I've made some changes that (I think) addresses these issues. Namely, I created a FileStoreCursor object. This brings up another point which is that the object oriented programmer in me says to create another abstraction to match "DataStorage" called "DataCursor." I don't know if this is pythonic, but I don't know how else to guarantee that others that want to extend this code for their own datastore know what to implement.
I've had some troubles with tox
because I use Anaconda instead of virtualenv. I couldn't find a developer wiki page and at this point it might be a good idea to create one. We can put some of the info that's been mentioned in this thread such as tox
, --debug
, etc. so that future developers can quickly see how to get started. I'm interested to know your development setup.
Here's the latest: https://github.com/chovanecm/sacredboard/compare/develop...gideonite:filestore
Great job! Don't worry about the delay - I'm currently busy with my university anyway. I'll try to have a look at it tomorrow (I'm in Europe), so now only in brief:
--debug
option).
In two weeks, I am going to defend the thesis and to take the "final state examination", which is necessary to graduate here in the Czech Republic. I probably won't have much time by then because I need to basically go through almost all courses I took to prepare for the exam.
But I was thinking about either creating a wiki page or generating Sphinx documentation (it is a kind of standard for Python projects). That's why I started with completing code documentation docstrings because they would become a part of the generated doc.But I'm glad you asked, it should be probably one of my priorities now.
I tried it now. Nice that the list actually shows something. Now I see I should definitely add tests for testing interaction with the backend (#40) so that developers on the backend side always know whether the data they return are in the expected format.
Basically, the object you are building with _create_run
should correspond to the object
key in the example "/api/run/id" in #40.
So I think in order for even the Run detail view to work, you need to set run["_id"]
and run["info"]
.
Again, thanks for doing this. It is helping not only with the functionality but also realising what's necessary to document and test.
Ah ok, now I understand a bit more about the context of the project. Makes sense.
I see your point now. Perhaps the more pythonic way of doing this, instead of heavy handedly forcing conformity via interfaces/abstract classes and the like, is to implement the required functionality in tests.
My only motivation for using the term "cursor" is to match the return type of the MongoDB interface.
Ok, I will make sure that the data type conforms to #40 and implement a test to make sure that the right keys are returned, etc. It will probably be some kind of integration test in which I create some experiments in /tmp
and make sure the resulting json has the right keys.
It's OK to have abstract classes in Python where appropriate, such as the DataStorage class you created. It is good that it tells developers what methods they have to implement. In the case of FileStoreCursor (aka DataIterator), I was just thinking the class could be put directly on the datastore level so that developers of other database backends can use it too.
When you are talking about integration tests, do you mean it would involve running sacred too? That would be amazing but if the test fails, it might not easy to tell whether the problem was caused by Sacred or Sacredboard. So I would see this as an additional level of testing.
I think testing of your module could be solved for instance by generating a few reports by Sacred, copying the directory e.g. to tests/data/fixtures/filestore
and loading them from there.
You can perhaps also try to decouple loading the JSON files from building the Run objects as it looks like the latter part could be reused e.g. for the TinyDB backend (Maybe by extracting the _create_run
method somewhere else. But once I have time again, I can try to design a better solution for it).
Currently only mongodb is supported as a backend. However sacred is able to use other databases as well. This could be supported in sacredboard as well.