Closed datakurre closed 9 years ago
Also, someone should re-enable Travis-CI-build.
Mr.migrator is command line for transmogrifier. Doesn't yet include annotation support but does do overrides of values via command line. Also has interesting online help idea which I'd like to replace with something that introspects blueprint docs. It might be cool to merge this into transmogrifier base? On 8 Nov 2014 13:14, "Asko Soukka" notifications@github.com wrote:
If transmogrifier is so great, why not to make it work also outside CMF context?
This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.
I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.
I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate
pull requests (event when they all are based on top of another)?
You can merge this Pull Request by running
git pull https://github.com/datakurre/collective.transmogrifier master
Or view, comment on, or merge it at:
https://github.com/collective/collective.transmogrifier/pull/4 Commit Summary
- Remove mandatory dependency from CMFCore
File Changes
- M setup.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-0 (14)
- M src/collective/transmogrifier/sections/codec.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-1 (9)
- M src/collective/transmogrifier/sections/configure.zcml https://github.com/collective/collective.transmogrifier/pull/4/files#diff-2 (1)
- M src/collective/transmogrifier/sections/tests.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-3 (32)
- M src/collective/transmogrifier/tests/test_transmogrifier.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-4 (32)
- M src/collective/transmogrifier/transmogrifier.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-5 (5)
Patch Links:
- https://github.com/collective/collective.transmogrifier/pull/4.patch
- https://github.com/collective/collective.transmogrifier/pull/4.diff
— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4.
Thanks. I'll check what's in mr.migrator.
Other upcoming thing would be z3c.autoinclude support (with plugin name "transmogrofier"). Also, I'll test, how well venusianconfiguration works outside Plone :)
-----Original Message----- From: Dylan Jay notifications@github.com To: "collective/collective.transmogrifier" collective.transmogrifier@noreply.github.com Cc: Asko Soukka asko.soukka@iki.fi Sent: la, 08 marraskuuta 2014 17:41 Subject: Re: [collective.transmogrifier] Remove mandatory dependency from CMFCore (#4)
Mr.migrator is command line for transmogrifier. Doesn't yet include annotation support but does do overrides of values via command line. Also has interesting online help idea which I'd like to replace with something that introspects blueprint docs. It might be cool to merge this into transmogrifier base? On 8 Nov 2014 13:14, "Asko Soukka" notifications@github.com wrote:
If transmogrifier is so great, why not to make it work also outside CMF context?
This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.
I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.
I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate
pull requests (event when they all are based on top of another)?
You can merge this Pull Request by running
git pull https://github.com/datakurre/collective.transmogrifier master
Or view, comment on, or merge it at:
https://github.com/collective/collective.transmogrifier/pull/4 Commit Summary
- Remove mandatory dependency from CMFCore
File Changes
- M setup.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-0 (14)
- M src/collective/transmogrifier/sections/codec.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-1 (9)
- M src/collective/transmogrifier/sections/configure.zcml https://github.com/collective/collective.transmogrifier/pull/4/files#diff-2 (1)
- M src/collective/transmogrifier/sections/tests.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-3 (32)
- M src/collective/transmogrifier/tests/test_transmogrifier.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-4 (32)
- M src/collective/transmogrifier/transmogrifier.py https://github.com/collective/collective.transmogrifier/pull/4/files#diff-5 (5)
Patch Links:
- https://github.com/collective/collective.transmogrifier/pull/4.patch
- https://github.com/collective/collective.transmogrifier/pull/4.diff
— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4.
Reply to this email directly or view it on GitHub: https://github.com/collective/collective.transmogrifier/pull/4#issuecomment-62262085
Too many changes, so I closed this pull. As a summary, what I had to do during this week:
The new CLI does not have full feature parity with mr.developer (yet, it should be able to run Plone pipelines with a custom context factory, which sets up Plone context for the pipeline).
Refactoring zope.pagetemplate -dependency optional ended up being controversial, because even removing it would leave only a few dependencies, removing it would also disable most of the shipped blueprints/sections. This required a lot of conditions into code, which does not look very nice.
We did use this successfully with venusianconfiguration and we could implement and register new blueprints with simple code like
@configure_blueprint(name='common.id')
class Id(ConditionalBlueprint):
def __iter__(self):
counter = 1
for item in self.previous:
if self.condition(item):
item.update({'id': counter})
counter += 1
yield item
All these might be too much for collective.transmogrifier, which must by used only for Plone projects (because of its current CMFCore dependency).
Now I'm thinking of refactoring my branch into just package named transmogrifier
, which would include just the core, maybe simple Expression and Condition blueprint and replace zope.pagetemplate
dependency with Chameleon. It should be also possible to make it Python 3 compatible then. And all c.transmogrifier blueprints should still be compatible. I'll preserve the history for all the code I take from c.transmogrifier.
There is already autoinclude support in Mr.migrator and all the funnelweb blueprints using the plugin transmogrify. Please don't invent a new one. I'd rather see Mr.migrator disappear and its features appear in core here. On 9 Nov 2014 01:03, "Asko Soukka" notifications@github.com wrote:
Thanks. I'll check what's in mr.migrator.
Other upcoming thing would be z3c.autoinclude support (with plugin name "transmogrofier"). Also, I'll test, how well venusianconfiguration works outside Plone :)
-----Original Message----- From: Dylan Jay notifications@github.com To: "collective/collective.transmogrifier" < collective.transmogrifier@noreply.github.com> Cc: Asko Soukka asko.soukka@iki.fi Sent: la, 08 marraskuuta 2014 17:41 Subject: Re: [collective.transmogrifier] Remove mandatory dependency from CMFCore (#4)
Mr.migrator is command line for transmogrifier. Doesn't yet include annotation support but does do overrides of values via command line. Also has interesting online help idea which I'd like to replace with something that introspects blueprint docs. It might be cool to merge this into transmogrifier base? On 8 Nov 2014 13:14, "Asko Soukka" notifications@github.com wrote:
If transmogrifier is so great, why not to make it work also outside CMF context?
This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.
I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.
I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate
pull requests (event when they all are based on top of another)?
You can merge this Pull Request by running
git pull https://github.com/datakurre/collective.transmogrifier master
Or view, comment on, or merge it at:
https://github.com/collective/collective.transmogrifier/pull/4 Commit Summary
- Remove mandatory dependency from CMFCore
File Changes
(14)
- M src/collective/transmogrifier/sections/codec.py < https://github.com/collective/collective.transmogrifier/pull/4/files#diff-1
(9)
- M src/collective/transmogrifier/sections/configure.zcml < https://github.com/collective/collective.transmogrifier/pull/4/files#diff-2
(1)
- M src/collective/transmogrifier/sections/tests.py < https://github.com/collective/collective.transmogrifier/pull/4/files#diff-3
(32)
- M src/collective/transmogrifier/tests/test_transmogrifier.py < https://github.com/collective/collective.transmogrifier/pull/4/files#diff-4
(32)
- M src/collective/transmogrifier/transmogrifier.py < https://github.com/collective/collective.transmogrifier/pull/4/files#diff-5
(5)
Patch Links:
- https://github.com/collective/collective.transmogrifier/pull/4.patch
- https://github.com/collective/collective.transmogrifier/pull/4.diff
— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4.
Reply to this email directly or view it on GitHub:
https://github.com/collective/collective.transmogrifier/pull/4#issuecomment-62262085
— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4#issuecomment-62269033 .
@djay I tried to re-use mr.migrators entry-point, but it does not work with zope.configuration >= 4.0, because "transmogrify" is not a real package:
ConfigurationError: ('Invalid value for', 'package', "ImportError: Couldn't import transmogrify, No module named transmogrify")
So, mr.migrator's entry-point-name is not compatible with zope.configuration >= 4.0, unless we'd like to depend on completely unrelated https://pypi.python.org/pypi/transmogrify
So, it seems, I'm not merging mr.migrator to collective.transmogrifier, but
@djay FYI. datakurre/transmogrifier
supports now both transmogrifier
and transmogrify
z3c.autoinclude packages. I realized that transmogrify
worked for you, because you always had at least one package declaring transmogrify
namespace package.
The new runner should now have feature parity with mr.migrator with a few changes in execution syntax.
For example, I can run full funnelweb.ttw import with buildout:
[buildout]
extends = http://dist.plone.org/release/4.3-latest/versions.cfg
parts = instance
versions = versions
extensions = mr.developer
sources = sources
auto-checkout = *
[sources]
transmogrifier = git https://github.com/datakurre/transmogrifier
[instance]
recipe = plone.recipe.zope2instance
eggs =
Plone
z3c.pt
transmogrifier
collective.transmogrifier
plone.app.transmogrifier
transmogrify.pathsorter
funnelweb
user = admin:admin
zcml = plone.app.transmogrifier
[versions]
setuptools =
zc.buildout =
(Note: a new Plone site cannot be created while funnelweb is in instance script.)
with command
bin/instance -OPlone run bin/transmogrify funnelweb.ttw commit.cfg crawler:url=http://datakurre.pandala.org "crawler:ignore=feeds\ncsi.js" --context=zope.component.hooks.getSite
in detail
zope.component.hooks.getSite
commit.cfg
is a simple pipeline calling transaction.commit after each item is processed:
[transmogrifier]
pipeline = commit
[commit]
blueprint = transmogrifier.to_expression
modules = transaction
expression = python:modules['transaction'].commit()
mode = items # run when all items have been yield (here None, because is separate pipeline)
Looks pretty cool. I like that we will finally have a transmogrifier that is free of CMF and can be used from the commandline. It should have been done a long time ago. Being able to join two pipelines on the commandline is a nice feature. Did you include the zcml load feature? That was needed because most blueprints don't have autoinclude and you might not be running it inside zope.
The only feature I can think of that you didn't implement is to display help on the blueprint arguments themselves. The way I did it was kind of ugly and used a special markup in the pipeline itself. I think a much better way would be to use a convention in the docstring of the blueprint definition. but perhaps it's not really the most important feature. It did result in useful help such as below.
$ bin/funnelweb --help Usage: funnelweb [options]
Options: -h, --help show this help message and exit --pipeline=FILE Transmogrifier pipeline.cfg to use --show-pipeline Show contents of the pipeline --zcml=ZCML modules in the path to load zcml from
crawler: Crawls site or cache for content
--crawler:url=URL the top url to crawl
--crawler:start-urls=LIST
additional urls to crawl at the start
--crawler:ignore=LIST
list of regex for urls to not crawl
--crawler:cache=DIR
local directory to read crawled items from instead of
accessing the site directly
--crawler:patterns=LIST
Regular expressions to substitute before html is
parsed. New line seperated
--crawler:subs=LIST
Text to replace each item in patterns. Must be the
same number of lines as patterns
--crawler:maxsize=BYTES
don't crawl anything larger than this
--crawler:max=INT Limit crawling to this number of pages
--crawler:ignore_robots=BOOL
Ignore robots.txt for when you really want their
content
--crawler:debug show extra debug information
itemcache:
typeguess: Sets Plone content type based on mime-type
--typeguess:condition=TAL
Tal expression returning boolean called for each
'item'
--typeguess:debug show extra debug information
template1: Provide XPath for title, description, text etc. Specify rules like --template1:title="text //p[1]" --template1:text="html //p"
--template1:debug show extra debug information
--template1:myfield=FORMAT XPATH
A rule to extract content from pages. XPATH must match
a node unless FORMAT is "optional". FORMAT of "text"
will strip html. FORMAT of "html" will return the
matched html.
template2: Used if no previous templates matched. see template1 for options
template3: Used if no previous templates matched. see template1 for options
template4: Used if no previous templates matched. see template1 for options
templateauto: Guesses XPaths of content by performing a cluster analysis of all the content not already matched
--templateauto:condition=TAL
A TAL expression returning boolean called for each
'item'. Turned off by default.
--templateauto:debug
show extra debug information
indexguess: Determines an item is a default page for a container if it has many links to items in that container even if not contained in that folder
--indexguess:condition=TAL
tal expression returning boolean called for each
'item'
--indexguess:default_pages=LIST
names that indication page should be a defaultpage
--indexguess:debug show extra debug information
--indexguess:min_links=INT
If a page has this many links to a single folder's
content it will be moved
--indexguess:max_uplinks=INT
If a page has more than this many links parent folders
then don't more it
sitemapper: Uses a indented html with links in to rearrange those links in the site
--sitemapper:condition=TAL
Which item to use as the sitemap
--sitemapper:debug show extra debug information
drop: Useful to drop certain content
--drop:condition=TAL
TAL expression returning boolean called for each
'item'
--drop:debug show extra debug information
attachmentguess: Finds items only referenced by one page and moves them into a new folder with the page as the default view
--attachmentguess:condition=TAL
TAL expression returning boolean called for each
'item'
--attachmentguess:debug
show extra debug information
--attachmentguess:defaultpage=NAME
name to give created defaultpages
hideguess: Picks content which won't be shown in the site navigation
--hideguess:condition=TAL
TAL expression to pick which items should be hidden
addfolders: --addfolders:default_containers=TYPE Type to set when creating folders --addfolders:debug show extra debug information
titleguess: Tries to find better page titles by analysing backlink text
--titleguess:condition=TAL
TAL expression returning boolean called for each
'item'
--titleguess:debug show extra debug information
--titleguess:ignore=LIST
don't use backlink text containing these substrings
urltidy: Applies title normalisation rules remove invalid chars from urls. It will also ensure all internal links are corrected
--urltidy:debug show extra debug information
--urltidy:link_expr=TAL
TAL expression to set new value of the path
--urltidy:use_title=TAL
TAL expression to switch id to use the title
--urltidy:invalid_ids
Rename the reserved words by Plone link_expr =
python:item['_path'].rsplit('.',1)[-1] in
['html','asp','php'] and
item['_path'].rsplit('.',1)[0] or item['_path']
changetype: Switch the type of the created object if desired
--changetype:value=TAL
TAL expression to give the new value for the Type of
object.
ploneupload: Adds content to plone via xmlrpc
--ploneupload:target=URL
The base url for where all content should be created.
Can support basic authentication e.g. target =
http://admin:admin@localhost:8080/Plone
--ploneupload:debug
show extra debug information
--ploneupload:skip-until-path=STRING
won't update anything until it reaches this path
ploneupdate: Updates content of existing object on a remote plone site via xmlrpc
--ploneupdate:target=URL
the base url for where all content should be updated.
Can support basic authentication
--ploneupdate:skip-unmodified=BOOLEAN
if true the modification date will be compared with
that on server and updating skipped
--ploneupdate:skip-until-path=STRING
won't update anything until it reaches this path
--ploneupdate:skip-fields=LIST
don't update these fields during update
--ploneupdate:skip-existing=BOOLEAN
if creation-key is set then update, otherwise skip
--ploneupdate:debug
show extra debug information
ploneportlets: Sets left and right portlets
--ploneportlets:target=URL
the base url for where all content should be updated.
Can support basic authentication
--ploneportlets:debug
show extra debug information
plonehide: Hide items from the navigation (hints to which items should be hidden are set earlier in pipeline) by default it will hide items not linked to outside of any body text
--plonehide:debug show extra debug information
publish: Set the workflow transition
--publish:value=TAL
TAL expression to return the transition to workflow
plonepublish: Publish or otherwise change the workflow state of remote plone content
--plonepublish:debug
show extra debug information
--plonepublish:skip-until-path=STRING
won't update anything until it reaches this path
plonealias: Creates aliases for items that have moved
--plonealias:skip-until-path=STRING
won't update anything until it reaches this path
ploneprune: Delete objects which are on the remote site, but not in local copy
--ploneprune:condition=TAL
TAL expression for which folders to remove old content
--ploneprune:debug show extra debug information
--ploneprune:trash folder to move pruned items (instead of delete)
localupload: Save transformed site locally
--localupload:output=DIR
directory to load transformed content into for
debugging
--localupload:debug
show extra debug information
Dylan Jay wrote:
Being able to join two pipelines on the commandline is a nice feature.
Currently it just executes them in serial, but does pass items from the previous pipeline to the next one. That would be possible with a few changes into Transmogrifier executor, but I had (non-Plone) use cases, where that was not the wanted behavior.
Did you include the zcml load feature?
Not yet. How about
[transmogrifier] require (or include?) = package.name package.name
instead of yet another command line argument. (I've been thinking if also --context-argument should be read from transmogrifier:context instead).
I think a much better way would be to use a convention in the docstring of the blueprint definition.
Supporting docstring sounds like a good idea would encourage to write them.
-Asko
Asko Soukka wrote:
Did you include the zcml load feature?
@djay I forgot that this was dropped, because in Plone context it's now enough to add zcmls into instance-parts zcml-list (like already done for plone.app.transmogrifier in the example).
[instance] recipe = plone.recipe.zope2instance eggs = ... zcml = ...
-Asko
Only if you are running inside zope. Which I don't do and Mr.migrator was designed not to have to do. I like the idea putting zcml in the pipeline.
Just adding support for --include=package
(or --include=package:filename.zcml
) was so much easier that I did it. Yet, I renamed it from --zcml
to just --include
(which is the actual zope.configuration API call here), because I'm still dreaming of merging my Python configuration syntax into zope.configuration in far future.
I also looked into passing items from one pipeline to another. Not too difficult, but make the new transmogrifier incompatible with the old one. I'd prefer to keep the current way of "just executing pipelines in serial, but separate" and add a built-in blueprint for executing a named pipeline so that it's trivial to make pipelines from pipelines. Like splitter blueprint, but simpler.
If transmogrifier is so great, why not to make it work also outside CMF context?
This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.
I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.
I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate pull requests (event when they all are based on top of another)?