cs50 / lib50

This is lib50, CS50's own internal library used in many of its tools.
GNU General Public License v3.0
16 stars 15 forks source link

how handle use of `files` in lab50? #8

Open dmalan opened 6 years ago

dmalan commented 6 years ago

So right now, lab50 supports a file key, which is either a string or a list of strings, which is based on CS50 Sandbox's support for a file HTTP parameter, https://cs50.readthedocs.io/sandbox/#configuration. In the context of HTTP, didn't seem appropriate to call it files, since multiple files have to be provided one at a time, a la:

?file=foo.c&file=bar.c

But in the context of .cs50.yaml, it seems a bit confusing to have lab50 use file and have check50 and submit50 use files. I could easily change over to files, but then I'd run afoul of the tag requirement, which shouldn't apply to lab50's use of files:

https://cs50.readthedocs.io/sandbox/#configuration

How best to handle?

Jelleas commented 6 years ago

The files key does make sense for lib50....

When we wrote this we envisioned files to be something used across tools (submit50 and check50) for which we wanted the exact same behaviour. Now that there's a third (lab50) that differs we should probably move validation for the files key to somewhere else in lib50.

You probably do want all the work we do for the files key (validating the key, figuring out wrong tags, creating the appropriate objects during parsing), but that to be customizable.

Something like this perhaps?:

loader = lib50.Loader()
loader.register(tag1, lambda value: Foo(value)) # loader.register(tag1, Foo)
loader.register(tag2, lambda value: Bar(value))
loader.load(config, "lab50")

This would let lib50 do the ugly work, and validate all tags for you. And we could just export a IncludeExcludeRequireLoader (better name wanted) for submit50 and check50.

dmalan commented 6 years ago

Relying on lib50 for tag parsing for lab50 too would be great, since I've been meaning to introduce syntax like

lab50:
  files:
    - !opened foo.c
    - !closed bar.c

so as to signal which tabs to open. So this would be handy. Could we perhaps avoid the need for a custom class per tag? Simpler would be a single wrapper class (e.g., Value) that defines the tags as attributes and has a __str__ method so that you can just write code like:

for file in files:
    print(f"Processing {file}")
    if file.opened:
        # do this
    else:
        # do that

Feels like a lot less overhead than defining one class per tag?

cmlsharp commented 6 years ago

Sure, that's what we do with !include/!exclude/!require. There's a single FilePattern class that has a type field with three variants. So it ends up looking like:

for file in files:
    if file.type is PatternType.Required:
        # do this
    elif file.type is PatternType.Include:
        # do that
dmalan commented 6 years ago

Should we perhaps simplify, though, so that we don't need to pre-define types, instead just setting properties to True for any present tags? Having to define PatternType.Required etc. feels like unnecessary work?

cmlsharp commented 6 years ago

We could, it just (a) makes it tougher to validate that the user didn't specify invalid tags, which we would want to error on and (b) the syntax wouldn't be quite as nice as your proposal (without resorting to some __getattr__ tricks) anyway since we can't define an attribute for all possible tags. Thus it would end up being something like if hasattr(file, "opened") or something like that. We could override __getattr__ to return None if the attribute isn't found, but that seems pretty hackish.

Also, as a general rule I think it is generally good practice to not allow an API to represent invalid states. It is possible for file.included and file.excluded to both be true even though that doesn't make any sense whereas in the current model, a file only has one type.

dmalan commented 6 years ago

Hm, having to register valid tags seems reasonable, in the spirit of:

loader = lib50.Loader()
loader.register(tag1, lambda value: Foo(value)) # loader.register(tag1, Foo)
loader.register(tag2, lambda value: Bar(value))
loader.load(config, "lab50")

But:

  1. Could/should we associate tags with specific keys somehow, since !required and such don't make sense outside the context of files?
  2. Do we really need to wrap the value with a custom class (e.g., Foo or Bar)? What functionality are those classes meant to provide, besides a constant for == checking? That feels like slightly annoying overhead for a caller to define?
Jelleas commented 6 years ago
  1. We're hooking into the PyYAML parser, and I don't see a direct way to add support for associating tags with keys. It seems YAML tags are supposed to be types with a global or document scope (https://camel.readthedocs.io/en/latest/yamlref.html#tags).

  2. We don't need custom classes for this. As an interface I propose .register(tag, callable(tag)) to take a tag and a callable that takes in a tag and returns something (we need to return the parsed YAML after all). What happens in that callable is up to the author. It could just be setting an instance variable, or adding something to a dict.

dmalan commented 6 years ago
  1. Sounds good, not a big deal.
  2. What might that look like specifically? Still feels like more work than should be needed just to check "is this string tagged with foo?", no? Might we just want to wrap all str values in the YAML with our own wrapper, a la the below?

class String:
  def __init__(self, value, tag = None):
      self.value = string
      self.tag = tag
      def __getattr__(self, name):
          return name == self.tag
      def __str__(self):
          return self.value
Jelleas commented 6 years ago

Oh! We could do something like that!

We can register a so-called multi-constructor (matching a prefix of a tag). That would just wrap every tagged string with a custom String (better name pending) class. Then the interface could just be:

config = lib50.Loader("require", "include", "exclude").load("<YAML_CONTENT>", "<TOOL>")

We don't even need to hack in __getattr__ if we require the author to specify the tags beforehand. We can simply set those tags to False by default.

cmlsharp commented 6 years ago

The __getattr__ trick does feel pretty hackish to me. Why not just do if string.tag == "foo" instead of if string.foo? It's such a small syntactic difference and makes it more clear to anyone reading it. It should be noted though that not all strings will be of this type, only strings that are tagged (which I think is a feature, but does seem slightly different than David's suggestion).

dmalan commented 6 years ago

not all strings will be of this type

Would it not be safer to wrap all str values, since tags themselves, per Jelle's note, can appear globally in a YAML file? And it'd definitely be annoying to have to adopt a convention of

if hasattr(s, "tag")

before accessing s.tag, no?

Jelleas commented 6 years ago

You're probably not checking str.tag at every part of the config though, only where you expect values to be tagged (like in the key files)? Seems weird to then want to wrap everything. So I agree with Chad, this is a feature!

That said we don't need __getattr__ in either case, because for the sake of lib50 validating the config, we need to know all tags beforehand anyway. So we know what tags can possibly exist, and can thus properly instantiate them on the object.

Coming back to my point earlier, since lib50.load() effectively serves as higher level (.cs50.yaml) specific parser. We could support local tags by parsing the whole .yaml file once, then parsing the parts where we expect tags individually again. This would allow us to raise more specific errors, and avoid tag conflicts.

dmalan commented 6 years ago

Okay, not every part, but how about when tags are optional? For labs, I think the common case will be "install these files and open them for the student," via:

lab50:
  files:
    - foo.c
    - foo.h

And it seems silly to require instead

lab50:
  files:
    - !opened foo.c
    - !opened foo.h

for the common case. So I'd envision a common scenario of, e.g.:

lab50:
  files:
    - application.py
    - !closed static/*
    - templates/*.html

or, equivalently:

lab50:
  files:
    - !opened application.py
    - !closed static/*
    - !opened templates/*.html

How best to accommodate optionality, then, without forcing use of hasattr or such?

Jelleas commented 6 years ago

I'd envision a flag in the loader (for that scope if we support local tag scope) something like default=True, that wraps everything in a tagless String by default. And causes lib50 to not throw an error in case of a missing tag (we want to throw an error in check/submit50).

So for lab50 it could be:

lib50.Loader("closed", default=True).load(...)

Or:

loader = lib50.Loader()
loader.scope("files", "closed", default=True)
loader.load(...)

Edit: Could also support a default tag, default="opened"

dmalan commented 6 years ago

Yeah, I think values can only have one tag, right? If so, setting a default tag is probably better than a default value, since (accidental) code like

lib50.Loader("closed", default=True).load(...)
lib50.Loader("opened", default=True).load(...)

would seem to create an ambiguity otherwise?

Jelleas commented 6 years ago

Implemented a working concept of Loader in lib50.config https://github.com/cs50/lib50/blob/7250882a65ac312f2c1dc281bc5a6fb9ac8532bd/lib50/config.py#L26

Unittests here: https://github.com/cs50/lib50/blob/7250882a65ac312f2c1dc281bc5a6fb9ac8532bd/tests/config_tests.py#L6

Interface is as follows:

loader = lib50.Loader("<TOOL>", "<GLOBAL_TAG_1>", "<GLOBAL_TAG_2>", ..., default="<DEFAULT_TAG>")
loader.scope("<KEY>", "<LOCAL_TAG_1>", "<LOCAL_TAG_2>", ..., default="<DEFAULT_TAG>")
loader.load(content)

For check50 we would do:

loader = lib50.Loader("check50")
loader.scope("files', "include", "exclude", "require")
loader.load(content)

For lab50:

loader = lib50.Loader("lab50")
loader.scope("files", "closed", default="opened") 
# or if !opened needs to be a valid tag
# loader.scope("files", "closed", "opened", default="opened") 
loader.load(content)

Currently you can only scope keys exactly one level deep (top-level is reserved for tools). That means you can scope keys like files, dependencies etc. Reason I did this was to avoid any complications with nesting scopes, or keys occurring multiple times in the YAML file.

As a result from .load() you get the same result as before, however every tagged value (or where a default was applied) is now an instance of TaggedValue. That has a tag, tags and value attribute, and a boolean attribute for every tag so that you can safely access tagged_value.<any_valid_tag> without having to check hasattr first.

I really like scopes here even in the case of check50, as we can now raise errors from lib50 if you tag a value where we're not expecting tags.

ps. I have not integrated any of this with lib50.files() yet, waiting for feedback :)

Jelleas commented 5 years ago

Here is how we currently envision lab50 can use lib50.files.

https://github.com/cs50/lib50/blob/7356e651e7ed46f49b358acb6f391b608eee92b9/tests/lib50_tests.py#L433-L471

dmalan commented 5 years ago

Few questions:

  1. What's the difference between !open and !include?
  2. What does a filename without a tag imply?
Jelleas commented 5 years ago
  1. !include would be, just include whatever matches the pattern, but don't open them. Whereas !open includes this specific file and opens it. This behavior is completely customizable.

  2. I set a default tag to !include. If a tag is missing, it gets tagged with !include.

https://github.com/cs50/lib50/blob/7356e651e7ed46f49b358acb6f391b608eee92b9/tests/lib50_tests.py#L453

Implemented the default behavior for this comment from last year (https://github.com/cs50/lib50/issues/8#issuecomment-413610981):

Okay, not every part, but how about when tags are optional? For labs, I think the common case will be "install these files and open them for the student," via:

lab50:
  files:
    - foo.c
    - foo.h

And it seems silly to require instead

lab50:
  files:
    - !opened foo.c
    - !opened foo.h

for the common case. So I'd envision a common scenario of, e.g.:

lab50:
  files:
    - application.py
    - !closed static/*
    - templates/*.html

or, equivalently:

lab50:
  files:
    - !opened application.py
    - !closed static/*
    - !opened templates/*.html

How best to accommodate optionality, then, without forcing use of hasattr or such?