Closed deanishe closed 5 years ago
My first thought is that splitting things up into modules is def the right way to go. In that vein (and following refactoring suggestion 1), I think decoupling all code that doesn't require the info.plist
file data (i.e. not directly related to the specific workflow being run) should be purely functional. Right now, I'm basically creating a workflow.Workflow
instance at the top of every module to get access to simple functionality (decode
and logger
are the main two, but I've used lots of others).
I also think that filter()
has grown too bulky as a function. I would like to hear people's thoughts on making it a class. It seems to me that all Script Filters in a workflow could simply be different instantiations of the ScriptFilter
class, which certain properties/attributes set. This would allow for cleaner modularity in the underlying code (not one api function with 9 params) and seems more intuitive to me. I have other thoughts here, but I'd like to see if people think "hell no" before I elaborate.
Finally, I've recently started thinking about how Alfred-Workflow
might be able to take advantage of other data in the plist file other than the simple metadata stuff. As Dean and I have talked some in Issue __, I have some code that will run all the Script Filters and Run Script nodes that are connected automatically with some simple input (it's high level testing, basically running the API of your entire workflow to ensure that works. I see it as a compliment to Python-style unit tests). As Dean has shown in some of his workflows, there's also the possibility of programmatically creating nodes in the plist file. I wonder if/how Alfred-Workflow
might add functionality to build the initial version of the Alfred workflow (creating all the nodes with the command and escaping prefs preset), where we'd only need to organize (yay visual clarity) and connect nodes.
On the testing front, I'd actually like to propose some minor API changes to make workflow (as in my workflow, not Alfred-Workflow
) testing easier. I like to create a faux environment when initially building the workflow and then later when debugging and adding features. This environment contains the source code (this directory mirrors what would be in a user's workflowdir
directory after downloading the workflow as well as a testing environment where I create dummy datadir
and cachedir
directories. The basic structure looks like this:
<root> (think ~/Code/<workflow>)
├── Pandoctor.alfredworkflow
├── README.md
├── src
│ ├── icon.png
│ ├── icons/
│ ├── info.plist
│ ├── pandoctor.py
│ ├── requirements.txt
│ ├── utils.py
│ └── workflow/
└── testing_env
├── cache_dir/
└── data_dir/
Each time I create this tree, I have to manually go into workflow.py
and alter the cachedir
and datadir
path properties to point to these dummy directories. I'd like to add setters for these properties, so that I can write testing code without having to ensure that I go back and fix those properties before I build.
Those are my initial thoughts. Everything in Dean's post I didn't comment on I simply agree with.
I think decoupling all code that doesn't require the
info.plist
file data (i.e. not directly related to the specific workflow being run) should be purely functional.
That's the general idea.
access to simple functionality (
decode
andlogger
are the main two, but I've used lots of others).
I imagine most of the existing Workflow
methods will remain, but many will be wrappers around functions that now live in other modules. Logging and decode()
will definitely go into other modules (where they can be used by scripts without needing a Workflow
instance).
It seems to me that all Script Filters in a workflow could simply be different instantiations of the
ScriptFilter
class, which certain properties/attributes set.
I'm afraid I don't know what you're driving at there. A class with 7 parameters instead?
At any rate, the main reason filter()
is one huge, hairy function and not a collection of smaller, saner ones is that function and especially method calls are expensive compared to inline code. Splitting it into a bunch of smaller functions would hurt performance, which is pretty important in filter()
.
By all means, feel free to split it up into more sensibly-sized functions and test the performance. I'd be very interested to see the results.
I have some code that will run all the Script Filters and Run Script nodes that are connected automatically with some simple input.
If you can think of some way to make that generalisable and testable (i.e. verify that the result was correct). Personally, I can't see how that can be done (workflows have a lot of side effects well outside of the scope that a Python script could capture them).
I wonder if/how Alfred-Workflow might add functionality to build the initial version of the Alfred workflow.
That sounds more like a task for cookiecutter
. I guess there could be a place for such a script in a tools
directory within the repo (alongside workflow-build.py
and workflow-install.py
, perhaps), but I think the work required to make something superior to Alfred's UI for creating the basic workflow is probably not worth the effort (how do you intend to handle the icon, for example?)
I wonder if/how Alfred-Workflow might add functionality to build the initial version of the Alfred workflow.
That sounds more like a task for cookiecutter. I guess there could be a place for such a script in a tools directory within the repo (alongside workflow-build.py and workflow-install.py, perhaps), but I think the work required to make something superior to Alfred's UI for creating the basic workflow is probably not worth the effort (how do you intend to handle the icon, for example?)
The idea is not actually build the workflow (not complete), just to put the nodes in the info.plist
so that you don't have to do the +
-> Inputs
-> Script Filter
-> type in the shell command for the script -> deselect the escaping options; then repeat. All you'd need is a function that takes the shell command as a string and writes the Script Filter xml to the info.plist
file. This would allow you to then open the workflow in Alfred and finish actually building it (connecting nodes (like Filter to Run Script), filling in the metadata, adding the image).
Spent some time to read the code base.
fold_to_ascii()
I don't like such a huge dictionary in the code. Doing a little research and find two solutions: pypi and unicodedata.
This is the most complicated function in this library. It has 83 lines document and 9 parameters. I have no idea about how it works for a long time. It just simply does too many things together.
Here is my suggestions:
Separate from workflow.py
Dean also mentioned it which I'm 100% agree with. filter()
is not a behivor of workflow.
Using class or method.
Honestly, there isn't too much difference between a 9 parameters function and a 7 parameters constructor for me. However, I'm prefer class because:
match_on
instead of lots of if
which makes code more clear and easier to use.Separately into multiple small functions.
A huge function is something we shall try to avoid(and 83 lines document). It's too hard for people to understand. For the performance issue, just did a little test. The difference of running time between 10000000 function calls and inline code is about 0.8 seconds. I don't think it's a huge difference consider Python is not a 'fast' language.
One more thing.
filter()
now filter based on max_result
before min_score
which I think it shall be in the other order.
Just as Dean mentioned. Just one little thing, the Data
and Cache
storage have lots of similar code can be potentially reduce.
I have some bash scripts that do the similar thing. e.g. Create an empty workflow or cd to the workflow directory based on workflow name. They are all very basic (and ugly) scripts. It's an interesting idea to have those commands and makes them perfect.(I just hate click :smile:). However, the only concern I have is it may be too much for a library. I'm much prefer to do it in another project(Or just throw the idea to Alfred team?)
A release version(export from Alfred) don't have to log debug information. Also, disable magic arguments because it's mostly for developer. It can't be perfectly done without support from Alfred but a config file for quick switching is something I consider.
Only one thing. RESPONSES
can be replaced by httplib.responses
Same as Dean, unit test shall not depends on real HTTP request. Some of them kept failing recently which may caused by github because I running them too many times.
First of all, I suppose I should make clear that Alfred-Workflow is very much aimed at novice programmers who may be writing their first code when they decide to try to write a workflow. This is my main reason for turning things like debugging and magic arguments on by default. I realise that neither would be considered "best practice" by experienced coders (I sure as hell don't do it in my other projects), but they're not the main audience (and are quite capable of turning them off).
As long as there are not real, negative effects in practice, helpful or useful to novices > The Right Thing™.
That's a nice trick with unicodedata
and NFKD
normalisation. There are a lot of non-Latin characters currently in ASCII_REPLACEMENTS
that won't be replaced using that method, though. Which is to say, it'd probably represent a large regression in terms of functionality. Any change would probably have to be a combination of the two.
I'm all for reducing complexity if it doesn't hurt performance. Definitely agree regarding making configuration easier.
This came up in #37. There's no point repeating myself, as I gave a detailed answer there. The TL;DR is that debug logging doesn't hurt performance unless you go really crazy, and turning it back on again can be difficult for non-technical users. Even if it could be turned on/off in a config file, users would still have to go hunting around in ~/Library
(which is now hidden by default) to find the file.
They are and they aren't for developers. Originally (though their scope has grown) they were intended to help developers to help users solve problems (this was before Alfred had a debugger). Now they serve dual role of helping with fixing problems (e.g. clear the cache/data/settings) and providing access to some of the library's features without every workflow developer having to code their own update interface. You can get away with:
if wf.update_available:
wf.add_item('An update is available',
'Enter `workflow:update` to install the new version',
icon=ICON_SYNC)
in a Script Filter. Otherwise, every workflow would need to implement its own interface to the update settings and its own logic to trigger an update.
I don't see much actual (practical) benefit in disabling them by default. Nobody has ever complained that they cause problems (i.e. interfere with normal queries), and it's trivial to turn them off, anyway.
The arguments against magic arguments being on by default, as with debugging, seem to be mostly theoretical "that's not how you should do it" ones, not "it causes this actual problem" ones.
As with debugging, if there were a simple and foolproof way to turn magic args on and off, I'd be quite happy to do The Right Thing™ and turn them off by default. But there isn't. Or at least, I haven't thought of one yet.
Yeah, RESPONSES
was largely copy-pasted from httplib.responses
. Mostly for clarity, but also because there were a couple of things I changed. There's probably no good reason it shouldn't be replaced with from httplib import responses as RESPONSES
.
Yeah, the real HTTP requests really should go and be replaced by canned responses. I just haven't got round to writing a proper fake response object.
Mostly, I'd like to make as many of the test cases as possible run without needing the workflow-like environment. Then all the HTTP requests wouldn't be such a problem…
Each time I create this tree, I have to manually go into workflow.py and alter the cachedir and datadir path properties to point to these dummy directories.
@smargh: I missed this one. cachedir
and datadir
are preferentially set from the environmental variables alfred_workflow_cache
and alfred_workflow_data
respectively. If you ensure these are set in the calling environment, you shouldn't have to mess around with Workflow
's code.
@deanishe
fold_to_ascii()
I think you are right. Besides, just look at unidecode from pypi again. Firstly, it uses GPL v2, so we can't really use it. Also, the way it solves the problem is creating 190 dictionaries which actually very similar with the solution we have. So the only thing I want to say is we shall move the dictionary into an individual file to keep the real code clean.
Firstly, I'm sorry I didn't pay attention to the early ticket so I didn't know someone already had this idea before. For the problem itself, I do understand what you said here. Even I'm not 100% agree with everything, there is no worth to argue every detail as long as it didn't cause any actual problem
. And I believe it won't.
That's all I got currently. I will keep thinking in the next few days and post more ideas if I can come up with more. :smile:
@owenwater
You can't be expected to go through the old issues :grinning:
We looked at unidecode
when we discussed diacritic folding. The main reason for not using that was the same as for writing web.py
instead of including requests: the library is huge. It's 2.7 MB installed and compiled (its setup.py
doesn't say it's zip-safe but it's still 450 KB zipped). Best case, it would triple the size of Alfred-Workflow for a rather minor feature.
OTOH, perhaps we should consider re-tooling Alfred-Workflow
to use unidecode
if it's installed, and fall-back onto its own ghetto folding if not.
I wouldn't go for a fallback approach , because then it could get really hard to debug the workflow in case of a problem with either unidecode
or Workflow
's minimalistic implementation.
As @owenwater suggested, just move the dict to a separate file :smile:
it could get really hard to debug the workflow
Even if it logged the library and version it's using?
The dict is going into a separate file one way or another :grinning:
Well, what are the advantages of using unidecode
and what are the chances that it is installed? I just checked and it's not on my Mac even though I have plenty of packages installed.
BTW: it'd be cool if you could start pre-releasing v2 as soon as possible. We've been talking about a lot in the last weeks that we have postponed until v2. Additionally, discussing things is much easier if you can try something out :smile:
unidecode
works with a whole bunch of languages, not just a handful of European ones. I believe it does a passable job with Russian, though I personally can't judge that. Here's the idea behind it (but don't be "yet another German" :wink:).
FWIW, I have it installed on my system. I think I was using it to generate URL slugs or web-friendly filenames.
I will start pulling things apart over Christmas when I'm on holiday. I just wanted to have a fairly clear idea of where I'm going first. A lot of the stuff I postponed till v2 was because it breaks the API.
Thanks, @deanishe.
How about you keep the minimalistic implementation as default and if someone needs more advanced unicode capabilities, he can enable unidecode
?
If you implement a fallback, someone could just ship Alfred-Workflow
with an included unidecode
, right? All the dev has to do is to run one more command, something like this:
pip install --target=/path/to/my/workflow unidecode
I know that you postponed stuff because v2 has a different API which can cause troubles. But we know about the changes and could use the new features and test them if you do pre-releases :smile:
But we know about the changes and could use the new features and test them if you do pre-releases
True enough. I think it'll be a while before there are any new features. Refactoring first…
How about you keep the minimalistic implementation as default and if someone needs more advanced unicode capabilities, he can enable unidecode?
I'm thinking more along the lines of a user can install unidecode
if they happen to work with languages that Alfred-Workflow doesn't cover and AW will automatically pick it up.
I see the support for comprehensive transliteration as being more user-specific than workflow-specific.
For example, a lot of my friends are Russian, but as we live in Germany, we all have German notebooks with German keyboards. If AW transparently supported unidecode
, your Gmail Client for Alfred workflow would allow them to also search their Cyrillic messages with a German keyboard. You wouldn't need to explicitly support it, nor include 2.7 MB of unidecode
in Gmail Client for Alfred just to provide a feature that's super-useful, but only to a very small proportion of users.
I'm finishing up a first attempt at implementing Workflow.filter()
using SQLite's Full Text Search capabilities. As we've already talked about Dean, the speed is insane, especially on larger datasets. I have an alpha version of the function (right now I'm building it all into one function, just so that you can easily follow what's happening) up on my fork of you index demo repo.
My question is, how would we like to add this functionality? The first point is that I cannot implement any of the initial matching patterns using FTS. Right now I have these filter flags available:
MATCH_STARTSWITH = 1
MATCH_ATOM = 2
MATCH_SUBSTRING = 4
MATCH_ALL = 7
Given the radically different nature of the functionality, I cannot simply add all the initial matching, as this requires iterating over the items. Should we make two types of filtering available? The pure python version with all of the initial matching, best suited for smaller datasets where you expect plenty of initial based queries, and then the FTS version, best for large datasets where you expect full string queries.
If we want to bundle both implementations, I think having a class that we make available via a function would be best to keep things organized.
Other than that, I'm hoping we can optimize the function even more and bring FTS filtering to version 2.
I was thinking sqlite FTS should be in a separate module (e.g. index.py
). As you say, it's a very different thing to filter()
.
To search on initials, you'd have to extract the initials from the raw data and create a corresponding index in the db. It might be possible via a database stored function. I don't know very much about them.
I may look into that in the future, but I think it's current implementation (only 3 flags) works well enough. I personally never anticipate users of my workflows using initial based searches. Having access to both allows for the feature if you want it.
I've refactored the code. I have a FTSDatabase
class that creates and searches the database. It's currently quite simple, but flexible in the places that matter most (going for 90% utility, not 100% utility). Feel free to give me your API suggestions.
You keep referring to code you've written and asking for input, but you never post a link to the code you're talking about :confused:
What am I supposed to be looking at?
I linked in my first post to my fork of your alfred index demo workflow. I'm posting my code there. Right now, I have the FTSDatabase
class in the fts.py
module, the two filter implementations in search.py
, and a testing playground in dev.py
I had it that way originally and then changed it for speed improvements. I actually do remove quotes from strings, but I'll go back and do more research to optimize the other way.
Doesn't matter. It's a lot more complex that just removing quotes (which you should actually be escaping, not removing).
For example, what if I hand your code a variable that ends with a backslash that escapes your terminating quote?
Your code is faster because it's relatively naïve.
Better to do it the right way and not roll your own. Chances are very good it has bugs the library's code doesn't.
@owenwater Just noticed this:
filter()
now filter based onmax_result
beforemin_score
which I think it shall be in the other order.
You're absolutely right. That's a definite bug. Results should be filtered by min_score
first.
So this is the current structure I'm working towards with v2:
Workflow
class — Convenience wrapper around functions from other modulesI also want to move to py.test for unit test. It's much more powerful and has plugins that mean that it's no longer necessary to make actual HTTP(S) calls or run tests from a wrapper script that sets up the testing environment.
Hi @deanishe, Looks great! Good work.
I'm not sure if you really need to move to py.test. Check this out: http://cramer.io/2014/05/20/mocking-requests-with-responses/
I'm also looking for an easy way to test HTTP requests in my workflows. Maybe this is something Alfred.Workflow
could also provide? As simple test suite for workflows?
Hey @fniephaus,
Thanks, but I haven't actually done any work yet…
I've looked at mocking responses etc. The problem as I see it is, I am (or we are) going to have to mock out a whole load of situations to properly test web.py. That leaves us with a lot of new code that also needs testing…
py.test has a few awesome plugins that can make all that unnecessary. In particular, there's an httpbin plugin that starts a local instance of httpbin (which is what 90% of the tests use). That would make it super easy to rewrite the tests.
There's a local HTTP server plugin that should cover the rest of web.py and make testing the update code that calls the GitHub API as simple as grabbing the releases JSON with curl and pointing the plugin at it.
What's more, the powerful fixtures API in py.test should make it easy to create the Alfred-/workflow-like environment a lot of the tests require.
py.test seems to be capable of solving 90% of the issues with the unit tests without us having to write a bunch of support code that in turn needs testing.
I'll add some links later. I'm currently stuck in the RegionalBahn on my phone.
So, what I was looking at was:
I can't find any of the other stuff right now, as I'm a wee bit drunk, but there was other good stuff that should make the tests relying on a workflow-like environment (env vars, info.plist etc.) really easy to test.
Also, I think core.py
is a crappy name. I've thought about alfred.py
, but that's also a bit crappy.
What's a good name for the module that contains all the Alfred-dependent variables/functions?
workflow.py? :-)
@m1keil
It's the obvious answer.
Tbh, I'm only after a different name because I'm sick of seeing workflow.workflow
in the docs…
A la the info.plist
maybe info.py
. Or environ.py
since its information about the workflow environment. Possibly also details.py
for similar reasons. I personally like the semantics of something like from details import name, bundle_id
.
@fniephaus @m1keil @owenwater @smargh So, I've pushed the alpha of v2 to the v2 branch.
TL;DR
What about…
Mostly v2 has been about moving code around and moving towards py.test
.
I've move code from workflow.py
into a lot of different files, but somehow it's still ~1500 lines long :disappointed:
I've been moving towards 3 goals:
Workflow
does without Workflow
. The Workflow
class just becomes a higher-level API.Workflow
instance being created. That's mostly fixed (in a very hacky fashion) in the most recent version of 1.x, but it's hacky. Libraries like update.py
should remain passive as much as possible.Currently, I'm stuck at the point of refactoring the data storage/caching API. It's required outside of Workflow
, so it can't rely on that, but still needs a simple interface. I'm tending towards a Serializer
superclass that all serializer plugins should inherit from. It would implement the store(d)_data()
and cache(d)_data()
APIs, and allow subclasses to only implement the dump()
and load()
methods. The superclass would have attributes like filename
, so that subclasses, like sqlite
, that can't work with file objects can do their thing without gymnastics.
There'd be a get_serializer(format)
function in storage.py
. It would return a object that has all the stored_data()
, cached_data()
etc. methods.
So my first question is: Do you think that's a reasonable implementation? Is there an obviously superior one?
Also, I worry if the env.py
module has a bit too much magic. As is stands, it imports WorkflowEnvironment
from _env.py
and essentially replaces itself (the env
module) with it (the WorkflowEnvironment
instance) on import. WorkflowEnvironment
contains all the Alfred environmental variables and the stuff from info.plist
, too. It loads environmental variables on instantiation, but info.plist
variables on request (delaying parsing).
So my second question is: Is that too much magic? It's a nice touch from a user perspective, but it makes writing tests a pain in the arse (though the WorkflowEnv
class in tests/util.py
solves those issues). Is the unit test hack good enough, or should the use of WorkflowEnvironment
be explicit? (Workflow
would hide most of the difference in either case.)
Serializer
and other APIsSo, I think the Serializer
superclass should stick fairly closely to the existing data storage/caching APIs. What about the Updater
and MagicArgument
superclasses? What would be the best API for them to hook into Workflow
(or the library in general)?
What's the best way for AW users to specify plugins? As an argument to Workflow
? Via an API? Via a configuration file?
@shawnrice and I have been talking about a general configuration file for workflows. He would have his data for Packal and I would have mine for AW configuration.
He's more in favour of JSON, I want INI (I think it's more friendly to non-savvy users and doesn't break when TextEdit is used).
What do you think?
Is there anything else that could be moved to a signal+plugin architecture?
I've been moving towards 3 goals:
- Allow the user to do (almost) everything Workflow does without Workflow. The Workflow class just becomes a higher-level API.
- Zero import side-effects. A problem with v1 is that importing any part of Alfred-Workflow often meant a Workflow instance being created. That's mostly fixed (in a very hacky fashion) in the most recent version of 1.x, but it's hacky. Libraries like update.py should remain passive as much as possible.
- A hook-based plugin architecture. Currently, there are no plugins and only a rudimentary signal/plugin system in place. In terms of core functionality, I want to move serializers, magic arguments and updating to plugins.
I love the layered approach. Leave a single, simple access point for most users (workflow authors), but allow for others to tweak/build. I think this will also make it easier for changes and upgrades.
I've read the hook code, but don't quite get it. I will need to sit with that for a bit, but I like the overall idea. Again, stronger decoupling makes for easier evolution.
Data storage/caching
Currently, I'm stuck at the point of refactoring the data storage/caching API. It's required outside of Workflow, so it can't rely on that, but still needs a simple interface. I'm tending towards a
Serializer
superclass that all serializer plugins should inherit from. It would implement thestore(d)_data()
andcache(d)_data()
APIs, and allow subclasses to only implement thedump()
andload()
methods. The superclass would have attributes likefilename
, so that subclasses, likesqlite
, that can't work with file objects can do their thing without gymnastics.There'd be a
get_serializer(format)
function instorage.py
. It would return a object that has all thestored_data()
,cached_data()
etc. methods.So my first question is: Do you think that's a reasonable implementation? Is there an obviously superior one?
This makes good sense to me. One level of class inheritance isn't too gnarly. It keeps things well contained, and allows for evolution (like adding a sqlite
serializer). My only question is about the user-facing API. I'm assuming the Workflow
class will function the same (e.g. Workflow.store_data(data, serializer=name)
). Does making the serializer the class make the API odd or confusing, like JSON.store_data(data)
? Or, are you planning on making module functions within storage.py
that mirror the Workflow
implementation without the class (e.g. storage.store_data(data, serializer=name)
). I think having Serializer
objects that define their own dump()
and load()
methods makes a lot of sense, but this feels like something that the user shouldn't need to see. They specify a different serializer if they want one as a param on storing, not as the object to do the storing.
Workflow variables
Also, I worry if the
env.py
module has a bit too much magic. As is stands, it importsWorkflowEnvironment
from_env.py
and essentially replaces itself (theenv
module) with it (theWorkflowEnvironment
instance) on import.WorkflowEnvironment
contains all the Alfred environmental variables and the stuff from info.plist, too. It loads environmental variables on instantiation, but info.plist variables on request (delaying parsing).So my second question is: Is that too much magic? It's a nice touch from a user perspective, but it makes writing tests a pain in the arse (though the WorkflowEnv class in tests/util.py solves those issues). Is the unit test hack good enough, or should the use of WorkflowEnvironment be explicit? (Workflow would hide most of the difference in either case.)
Again, I've read the code but don't fully understand what's happening. Why can't you do all of this in env.py
? Could you not put the WorkflowEnvironment
class in env.py
, and then put
e = WorkflowEnvironment()
sys.modules[__name__] = e
at the bottom of the module? Why put it in a separate file? Or, why not just create aliases in the env.py
module namespace, like:
e = WorkflowEnvironment()
info = e.info
logfile = e.logfile
#etc...
Not as clean perhaps, but totally understandable as to what's happening.
But all of these thoughts are predicated on me not knowing what exactly is happening, so there may very well be implementation details that makes such things impossible. Just thoughts and suggestions.
Serializer and other APIs
So, I think the
Serializer
superclass should stick fairly closely to the existing data storage/caching APIs. What about theUpdater
andMagicArgument
superclasses? What would be the best API for them to hook intoWorkflow
(or the library in general)?
Need to think on this.
Specifying plugins and configuration
What's the best way for AW users to specify plugins? As an argument to Workflow? Via an API? Via a configuration file?
@shawnrice and I have been talking about a general configuration file for workflows. He would have his data for Packal and I would have mine for AW configuration.
He's more in favour of JSON, I want INI (I think it's more friendly to non-savvy users and doesn't break when TextEdit is used).
What do you think?
JSON allows for robust data types, but does break easily. If you want something "unbreakable", then I think striving for key=value
string to string configs (a la Bash) is probably safest.
All in all, things look good for an alpha. I think moving to a layered setup, with flexibility but sane defaults is great. The best but trickiest part of Alfred-Workflow
is that it is used by people who basically know no Python all the way to professional Python developers (and oddballs like me in the middle). So, aiming to have one thing that works awesomely for both groups is undoubtedly difficult, but this looks like it is 100% on the right path.
It's definitely a good implmentation. But I don't think Serializer
is necessary. For example, the attribute like 'filename' as you mentioned, a serializer may not need it at all, e.g. a serializer that depends on multiple files. I just feel like duck type is simple and pythonic.
It took me some time to fully understand your purpose here and the complication here is absolutely necessary. A good feature should not be removed just because of the unit test.
I'm not fully understood hook.py
here. But based on my understanding, a plugin contains some features that may not be needed by everyone so it should be easy to develop/read/install/use. My idea is based on two different types of plugins:
There aren't much things needed to be done for the first one. For the second one, a plugin loader should be able to replace the function automatically who shares the same name.
For example, to implement pretty_xml
plugin, all we need to do is to create a new workflow.py/Workflow.send_feedback()
function just like the one in main code but this one is actully in the plugins folder.
I think a config file for plugins is too much. People should be able to enable a plugin by throwing a file/directory into plugins/
which just like vim or zsh.
There are some classes inherit the dict
class but haven't overriden some functions like clear()
, pop()
, popitem()
.
@smargh
Tip: I keep editing your comments to add this, so … When you have a code block, append the language name to the opening ticks to get syntax highlighting:
```python
# Call external processes
from subprocess import Popen
debug = True
gives you:
``` python
# Call external processes
from subprocess import Popen
debug = True
Could you not put the
WorkflowEnvironment
class inenv.py
Not without a lot of messing around. Once the WorkflowEnvironment
instance is put in sys.modules
it can no longer see anything in the env.py
namespace, e.g. os
, plistlib
or util
, so method calls fail. Initially, I aliased all the other globals within WorkflowEnvironment.__init__
, but in the end it was cleaner to put WorkflowEnvironment
in a separate module and define it normally.
Or, why not just create aliases in the
env.py
module namespace
WorkflowEnvironment
is lazy-loading. It won't explode without an info.plist
until you try to access one of its variables that it loads from info.plist
. If you start pulling things like info
and logfile
out of it, you won't be able to import env.py
without an info.plist
present.
As I said, I want to avoid import side-effects as much as possible. It makes testing much easier in particular, and is bad form in general. env.py
already crosses that line (which is why I asked if it's "too much magic"), but it only fiddles with itself and won't immediately explode if it's not in a workflow environment.
The alternative is to use the WorkflowEnvironment
class directly:
from env import WorkflowEnvironment
env = WorkflowEnvironment()
...
or remove a lot of the magic and add an instance to env.py
:
In env.py
:
class WorkflowEnvironment(object):
...
env = WorkflowEnvironment()
Then in other files:
from env import env
...
My only question is about the user-facing API. I'm assuming the
Workflow
class will function the same (e.g.Workflow.store_data(data, serializer=name)
)
Yup. The Workflow
API should remain largely the same. It's just a question of what the best implementation is. There will be some changes, though, which will be backwards-incompatible in many cases.
For example, I want to remove Workflow.alfred_env
and Workflow.info
and replace them with Workflow.env
(which is an alias for the workflow.env
module), which combines them. The contents of info.plist
would still be available at Workflow.env.info
.
@owenwater
But I don't think
Serializer
is necessary. For example, the attribute like 'filename' as you mentioned, a serializer may not need it at all, e.g. a serializer that depends on multiple files. I just feel like duck type is simple and pythonic.
Concrete serializers wouldn't be required to inherit from Serializer
, they'd just need load()
and dump()
methods as they do now.
The purpose is more to put the caching/storage logic into the Serializer
class where concrete serializers can override or change it if necessary, so they aren't limited to the load()
+ dump()
API.
Currently, serializers get an object and an open file handle to work with. The sqlite
-based serializer we're working on needs a filename, however. I think it'd be a cleaner solution if it could hook into the Serializer
superclass to grab the filename instead of trying to extract it from an open file handle.
This would be an implementation detail and wouldn't affect the API.
I'm not fully understood
hook.py
here
Not surprising, as it doesn't actually do anything yet. The plugin system is modelled on pelican's. It will (hopefully) work like this:
plugins.add_plugin_path('some/dir/path')
.plugins.load_plugins(['plugin1', 'plugin2', ...])
.register()
function. Typically, register()
will import hooks.py
and subscribe (connect) to one or more of the signals.Workflow
(mostly) sends signals when something happens (e.g. initialisation) or it wants something (e.g. a serializer for a given format).I'm undecided whether all plugins should be loaded automatically or whether they need to be explicitly activated.
I'm not sure if plugins will prove to be of any use to users, but I think using them makes (or forces me to make) the internal API cleaner.
People should be able to enable a plugin by throwing a file/directory into plugins/ which just like vim or zsh.
I was thinking more as a way to disable plugins, e.g. turn off magic arguments. Mostly, I see it as an alternative to the arguments to Workflow
. If you have lots of Script Filters and a whole bunch of arguments to Workflow
, being able to put them in a workflow.ini
file might be easier. DRY and all that. It would be entirely optional.
It would also be a handy place to turn debugging on/off and add any settings your own plugins or scripts need:
[default]
version=1.1
[plugins]
; Add custom plugin path
paths=./plugins
[[enabled]]
; Your custom plugins
updater_bitbucket
serializer_xls
serializer_tsv
serializer_csv
[[disabled]]
; Turn off magic arguments
magic_default
[update]
; Settings for the updater
bitbucket_repo=deanishe/alfred-convert
; Check for new version every week
interval=7
[xls]
; Custom settings understood by your plugin(s)
format=xlsx
a plugin loader should be able to replace the function automatically who shares the same name
That's the idea. The default plugin directory would be last in the resolution order, so a custom plugin with the same name would override a built-in one.
For example, to implement
pretty_xml
plugin
There's already a stub of one in the plugins/
directory and I've added an xml_generator_done
signal that is called with the generated XML.
feedback.XMLGenerator
could easily be discovered via hooks, though, if being able to use a custom generator were useful.
There are some classes inherit the
dict
class but haven't overriden some functions likeclear()
,pop()
,popitem()
Well spotted. PersistentDict
definitely needs a clear()
method. I'm not sure if it's a good idea to add the other methods, though, especially to WorkflowEnvironment
.
Thanks for the explanation.
That makes more sense to me. It's good to have a superclass if multiple serializers share code.
My original idea which might not be very clear in the last post was using reflection but not hook. Because a hook architecture requires implmentation for every place that can be injected. However, it turned out not a bad idea after I read your example and thought it again this morning. Because the hook looks much safer and it's enough for internal usage.
A configuration file is great if it's not only includes enabled/disabled plugins but also many other workflow related information. The only concern I got here is having both of [[enable]]
and [[disable]]
.
The pop()
and popitem()
is important for PersistenDict
because it's a low level class. User should be able to use it just as a dictionary without any suprise. Because those two funcitons can change the content of a dictionary, the save()
function should be called after.
The WorkflowEnvironment
is a read-only class for me. My suggestion is throwing an exception for any setting attempt.
My original idea which might not be very clear in the last post was using reflection but not hook
That was my original plan, before having a look at a few plugin/hook systems. The thing I like about the planned implementation is the ability to add hooks just about anywhere.
The only concern I got here is having both of
[[enable]]
and[[disable]]
I agree. As I said, I'm unsure whether to enable all plugins by default or require them to be explicitly enabled. I suppose on-by-default and lose the [[enabled]]
section.
The
pop()
andpopitem()
is important forPersistentDict
because it's a low level class
I've never used them, but I've added them.
I'll vote for [[disable]]
section as well since it's simple for most people.
I'll concede the json to ini point. The only resistance that I ever had was that I've always thought that the ini extension is ugly (based on previous windows experience), that ini files themselves look ugly (based on other aesthetic aspects), that json files, for some strange reason, look pretty.
But all the reasons that that make JSON fail are compelling to switch to INI, and INI is easy enough to implement in PHP (and I'm assuming Ruby as well).
We just have to decide on a canonical filename. While info.ini
would make sense for reasons of symmetry, we probably should go with workflow.ini
because that just makes more sense.
I'm also planning on implementing this, somehow, in a PHP library that I'm writing.
It would be nice to have the ini
file sectioned in a way that would indicate "global", "packal", "alfred workflow", "alphred", etc.... but I'm not sure if that will provide too many barriers on the user side.
INI is ugly and JSON is far prettier, but INI is much easier for humans to edit, is natively supported by at least PHP and Python, and — importantly — supports comments (not allowed in canonical JSON).
We'd have to figure out a useable syntax subset supported by PHP and Python: they appear to use different syntaxes for nested sections.
Hadn't given the name much thought, tbh. workflow.ini
would be a good choice for the reasons you give.
It would be nice to have the ini file sectioned in a way that would indicate "global", "packal", "alfred workflow", "alphred", etc...
Indeed. There's common stuff, though. We'll have to figure out which variables belong in a default
or generic workflow
section (version?)
Can you guys stop including me in every email you send to each other?
Thank you.
On 30 Jan 2015, at 09:27, Dean Jackson notifications@github.com wrote:
INI is ugly and JSON is far prettier, but INI is much easier for humans to edit, is natively supported by at least PHP and Python, and — importantly — supports comments (not allowed in canonical JSON).
We'd have to figure out a useable syntax subset supported by PHP and Python: they appear to use different syntaxes for nested sections.
Hadn't given the name much thought, tbh. workflow.ini would be a good choice for the reasons you give.
It would be nice to have the ini file sectioned in a way that would indicate "global", "packal", "alfred workflow", "alphred", etc...
Indeed. There's common stuff, though. We'll have to figure out which variables belong in a default or generic workflow section (version?)
— Reply to this email directly or view it on GitHub.
You need to unsubscribe from the GH issue. The emails are automatic.
On Fri, Jan 30, 2015 at 4:39 AM, OwenMin notifications@github.com wrote:
Can you guys stop including me in every email you send to each other? Thank you.
On 30 Jan 2015, at 09:27, Dean Jackson notifications@github.com wrote:
INI is ugly and JSON is far prettier, but INI is much easier for humans to edit, is natively supported by at least PHP and Python, and — importantly — supports comments (not allowed in canonical JSON).
We'd have to figure out a useable syntax subset supported by PHP and Python: they appear to use different syntaxes for nested sections.
Hadn't given the name much thought, tbh. workflow.ini would be a good choice for the reasons you give.
It would be nice to have the ini file sectioned in a way that would indicate "global", "packal", "alfred workflow", "alphred", etc...
Indeed. There's common stuff, though. We'll have to figure out which variables belong in a default or generic workflow section (version?)
— Reply to this email directly or view it on GitHub.
Reply to this email directly or view it on GitHub: https://github.com/deanishe/alfred-workflow/issues/55#issuecomment-72176327
:grinning:
GitHub does send far too many emails, though. I had to unfollow most of the repos I was following on account of the silly amount of emails. Even the ones I've contributed to.
Oh wow. He's deleted his account.
I hope that wasn't all "our" emails…
How strange. He never contributed a pull request or anything, did he? He simply started "watching" the repo? That sucks.
My mistake. He didn't delete his account. I just got his name (Owen Min) confused with his username (owenwater).
Yeah, he contributed 2 pull requests. Some very valuable input, too. I really hope he chips in again in the future.
Pleasure to interact with. Not like you-know-who in #59 …
Ah. I see. Yeah, those pull requests were quite good (better than all of mine). And you'll always meet some unique people on the interwebs. It gets alot more likely when you have a popular project (like this one).
To be fair, we discussed his accepted PR in detail in #47 before he submitted it (although I wish I'd accepted his other PR instead of implementing it myself and then having to add his additional — and very good — code by hand later).
The issue with your pull requests (and perhaps more generally), as @shawnrice and I have noted, is you tend to get too caught up in the specific nut you're trying to crack. As a result, you sometimes overlook how your idea might break other stuff (e.g. #51). It's always a good idea to ask yourself (a) why is what I'm planning bad/wrong and (b) can I generalise this to make it more useful?
I think everyone goes down such ratholes. The critical difference is remembering to ask the 2 above questions before going public…
Github does generate lots of email, feel sorry for spam that poor guy so much. However, that's still interesting to have someone's id same as my name.
BTW, I can write some code this weekend. Do you get anything has high priority but haven't started yet for v2 so I can help? Just don't want to make the same mistake again. :smile:
I'm confused as fuck now. Who was it complained again? I'd got it into my head that it was you @owenwater, but I'm a bit drunk right now.
Oh wait. Did one of us type @owenmin instead of @owenwater? Haha!
Regarding coding, nothing that still needs implementing has a fixed API yet. As a result, I'm hesitant to ask someone to have a go at writing it for fear that it will need rewriting.
I'm currently working on the cached/stored data API, but all I've done so far is cut-and-paste it into storage.py
(I haven't actually run any of the code yet).
The update system needs refactoring around hooks/plugins, but depends on the caching API. Keychain and magic args need moving out of Workflow
, but they're mostly cut-and-paste jobs.
I think what would help most at this point would be an "audit" of the code I've written/moved around so far, and some brainy thinking about what's wrong, what's missing and how the update, serializer and magic argument APIs should work.
Workflow.filter()
?workflow.ini
apart from version and update settings? How should it be integrated with Workflow
? Does it go in Workflow
or env
? (I'm thinking env
, so plugins can access the configuration without having to instantiate Workflow
, which is something I want to avoid.)env
? For example, I've been thinking about theme_light
/theme_dark
, which would be a boolean calculated from the colour of the theme background, so users would know whether to use light or dark icons.
A significant refactoring of Alfred-Workflow is long overdue. The
Workflow
class has grown very large and the unittests are a horror show, largely due to the necessity of a workflow-like environment to run them in.I'd love to hear your ideas on how the library should be refactored and how the API should be changed.
My thoughts so far:
Refactoring
info.plist
orversion
files, Alfred environmental variables) to its own module(s).Workflow
into separate modules (data storage and cache functions, settings, filtering, text processing, serialisation etc.).API changes
max_age
to 0 inWorkflow.cached_data()
, so cached data is always returned regardless of age by default.version
fromupdate_settings
dict to aWorkflow
argument (as an alternative toversion
file).filter()
to be called with an emptyquery
, whereupon it will return allitems
. This would mirror the way the built-in Pythonfilter()
function works, and allow @smargh to not bother checking whetherquery
is empty.Paging @smargh @fniephaus @owenwater :grinning: