chriscz / pysorter

A command line utility for organizing files and directories according to regex patterns.
Mozilla Public License 2.0
43 stars 16 forks source link

Rethink rule definitions #31

Open chriscz opened 7 years ago

chriscz commented 7 years ago

This issue is for discussing ways to make the rule definitions more flexible and accessible to a wider audience of non python programmers.

kontravariant commented 6 years ago

Following up on my reddit comment on this! Looking forward to helping if I can.

I like the proposition of YAML. Rules could be named, and priority can be given. They could be grouped ala Ansible Groups

I'm thinking something like this (from the README):

    ---
    pdf:
        - pattern = '.*\.pdf$'
        - dest = 'documents/pdf/'
        - priority = 0

(the pattern, dest and priority can either be explicit, or implicitly defined by their order, or both)

In this way, rules don't have to be manually reorganized to reassign priority. As well, this could be extended to allow for grouping definitions (possibly in the same file) and assigning priorities there (with a decision needing to be made on which priority takes precedent).

chriscz commented 6 years ago

Thanks for your interest Tyler!

I like the priority idea and can see its usefulness, but I want to think more about the overall structure of this before tackling it. I would especially like to see examples where existing functionality is possible.

Also, your yaml example is valid, but entries such as a = 1 are seen as strings, I believe you intended it to be something like

 pdf:
    pattern : '.*\.pdf$'
    dest : 'documents/pdf/'
    priority : 0

Some questions that would drive the design (please tell me if you have any others)

  1. Would importing definitions from other files be useful to power users or does it complicate matters?
  2. How would we reference standard rule definitions such the recent mimetypes pattern support?
  3. Support should remain for skipping (files and directory traversals)

Example

Below is a draft of some "nice-to-have" / usable styles for your suggestion of groups.

---
# non-rule configuration
config:
  # all non-absolute paths these are relative to this file
  - include: 
    - name: foo
      path: foo.yml

      # name inferred as bar
    - path: bar.yml

  # include rules defined somewhere in python
  - pyinclude:
    # this variable doesn't exist yet, but will (if) this gets implemented
    - pysorter.rules.mimetypes

  - order:
    # x and y indicate groups
    - 'bar:xxx'

    # in this file
    - :documents

    # include all groups
    - 'foo:*'

    # all (remaining) groups from this file
    - ':*'

    # the python included rules
    - pysorter_rules_mimetypes

# rule definitions
documents:
  # a default priority for this group (when compared to other groups)
  # this could lead to confusion, hence the g prefix
  gpri : 0
  # (optional) prefix for all consequent destination. 
  # This is literally prepended, so no implicit slash will be added
  dst-prefix: 'foo/'
  # a default destination inherited by all 
  dst : 'documents/' 
  rules:
      - pat: '(?P<year>\d{4})\.pdf$'
        dst: 'documents/pdfs_{year}.pdf'

      # long form
      - pat: '\.pdf$'
        dst: 'documents/pdfs/'
        pri: -10 # optional priority (higher takes precedence)

      # alternative shortform (also excluding quotes, which is apparently legal)
      # here only the pattern is given and the destination is inferred from dst.
      # would only be legal when group dst is defined
      - \.docx$

      # another shortform as a list
      - ['\.xlsx$', 'documents/excel/']

      # OR
      - - '\.xlsx$'
        - 'documents/excel/'

      # Skipping actions (skip these filetypes instead of moving)
      # for these the dst would not matter
      - pat: \.doc$
        action: skip

I won't have time to implement this in these next two month, so help wil be appreciated.

Let me know what you think!

kontravariant commented 6 years ago

Glad to help! Looking forward to working on this.

  1. Would importing definitions from other files be useful to power users or does it complicate matters?

I would suggest modeling after the Ansible Inventory system. In the long term, allowing for rules to be defined in yaml files, they can then be referenced by their inventory file name (i.e. inv1.ruleX, where ruleX is a rule in inv1.yaml) and put into groups, or custom sets of definitions on the fly.

  1. How would we reference standard rule definitions such the recent mimetypes pattern support?

I assume you are referring to this commit? I'm not quite sure I know what you are suggesting, but I would think standard rule definitions can either be left alone, or overridden if a rule conflicts.

  1. Support should remain for skipping (files and directory traversals)

I like your suggestion for skipping

---
  - pat: \.doc$
    action: skip

However I think it may be better to reserve a keyword "skips" for this purpose: i.e.

# rule definitions
documents:
  gpri : 0
  dst-prefix: 'foo/'
  dst : 'documents/' 
  rules:
        # rules here...
  skips:
      - ['\.docx$', '\.xlsx$']

I will start familiarizing myself with the codebase and design patterns!

chriscz commented 6 years ago
  1. I have no experience with Ansible yet, though I started looking at it recently; literally the day before you commented. Since its based on an existing design it saves much effort on our side.
  2. Yes that is the correct commit. It's basically just an extension to the rules that adds Python's mimetypes to the list of rules.
  3. I much prefer the skips keyword. I can't think of a case where my suggestion for skip rule items would behave differently from this.

I will start familiarizing myself with the codebase and design patterns!

If you have any questions, we could use them to start a FAQ for future developers.

kontravariant commented 6 years ago

I've started digging in and devising a strategy for tackling this. Very exciting! (these will also be my first contributions to a public project, so any guidance regarding incidental faux-pas are welcome, and I apologize in advance).

I like the priority idea and can see its usefulness, but I want to think more about the overall structure of this before tackling it.

I'm quickly realizing this task will require a fairly comprehensive specification for the filetypes.yaml file. Would you prefer I utilize the wiki to develop such a spec (and would I do so on this repo or in my fork?) Or should I create a test-case filetypes.yaml as a sort of living document/'documentation by example'.

chriscz commented 6 years ago

I prefer the documentation by example approach, since it leads to more pragmatic solutions. Use cases should feed the design and vice versa, and they doubly serve as initial test cases for the processor.

As soon as you have a good idea of the initial design requirements, feel free to add the examples and the spec to the wiki so we can discuss it.

kontravariant commented 6 years ago

I've started a wiki page for fleshing out this design. It's worth noting that the example filetypes.yaml there (copied from my fork) is a portion of the filetypes.py currently in the repo, i.e. it is the beginning of a mirror image of filetypes.py (so that, ostensibly, they could both be persistent in the project and provide a starting point dependent on the user's preference).

chriscz commented 6 years ago

Awesome, I have some minor changes I'd like to make to the spec on the Wiki. You should be able to view the diff for clarity, though I'll type a summary here.

chriscz commented 6 years ago

Just changed the version to use integers instead of strings. There's a skip rule that reads .idea$ is that meant to be the idea directory? Because then it should be .idea/$, same question about the git rule.

So far so good :)

https://github.com/chriscz/pysorter/wiki/Filetypes/_compare/757a46d40b1aa674219d260c57029d2c0a44f370...e11e25a2f3280747335ba7b9a59eba233e47d9e3