collective / collective.transmogrifier

A configurable pipeline, aimed at transforming content for import and export
Other
5 stars 8 forks source link

Remove mandatory dependency from CMFCore #4

Closed datakurre closed 9 years ago

datakurre commented 10 years ago

If transmogrifier is so great, why not to make it work also outside CMF context?

This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.

I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.

I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate pull requests (event when they all are based on top of another)?

datakurre commented 10 years ago

Also, someone should re-enable Travis-CI-build.

djay commented 10 years ago

Mr.migrator is command line for transmogrifier. Doesn't yet include annotation support but does do overrides of values via command line. Also has interesting online help idea which I'd like to replace with something that introspects blueprint docs. It might be cool to merge this into transmogrifier base? On 8 Nov 2014 13:14, "Asko Soukka" notifications@github.com wrote:

If transmogrifier is so great, why not to make it work also outside CMF context?

This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.

I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.

I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate

pull requests (event when they all are based on top of another)?

You can merge this Pull Request by running

git pull https://github.com/datakurre/collective.transmogrifier master

Or view, comment on, or merge it at:

https://github.com/collective/collective.transmogrifier/pull/4 Commit Summary

  • Remove mandatory dependency from CMFCore

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4.

datakurre commented 10 years ago

Thanks. I'll check what's in mr.migrator.

Other upcoming thing would be z3c.autoinclude support (with plugin name "transmogrofier"). Also, I'll test, how well venusianconfiguration works outside Plone :)

-----Original Message----- From: Dylan Jay notifications@github.com To: "collective/collective.transmogrifier" collective.transmogrifier@noreply.github.com Cc: Asko Soukka asko.soukka@iki.fi Sent: la, 08 marraskuuta 2014 17:41 Subject: Re: [collective.transmogrifier] Remove mandatory dependency from CMFCore (#4)

Mr.migrator is command line for transmogrifier. Doesn't yet include annotation support but does do overrides of values via command line. Also has interesting online help idea which I'd like to replace with something that introspects blueprint docs. It might be cool to merge this into transmogrifier base? On 8 Nov 2014 13:14, "Asko Soukka" notifications@github.com wrote:

If transmogrifier is so great, why not to make it work also outside CMF context?

This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.

I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.

I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate

pull requests (event when they all are based on top of another)?

You can merge this Pull Request by running

git pull https://github.com/datakurre/collective.transmogrifier master

Or view, comment on, or merge it at:

https://github.com/collective/collective.transmogrifier/pull/4 Commit Summary

  • Remove mandatory dependency from CMFCore

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4.


Reply to this email directly or view it on GitHub: https://github.com/collective/collective.transmogrifier/pull/4#issuecomment-62262085

datakurre commented 9 years ago

Too many changes, so I closed this pull. As a summary, what I had to do during this week:

The new CLI does not have full feature parity with mr.developer (yet, it should be able to run Plone pipelines with a custom context factory, which sets up Plone context for the pipeline).

Refactoring zope.pagetemplate -dependency optional ended up being controversial, because even removing it would leave only a few dependencies, removing it would also disable most of the shipped blueprints/sections. This required a lot of conditions into code, which does not look very nice.

We did use this successfully with venusianconfiguration and we could implement and register new blueprints with simple code like

@configure_blueprint(name='common.id')
class Id(ConditionalBlueprint):
    def __iter__(self):
        counter = 1
        for item in self.previous:
            if self.condition(item):
                item.update({'id': counter})
            counter += 1
            yield item

All these might be too much for collective.transmogrifier, which must by used only for Plone projects (because of its current CMFCore dependency).

Now I'm thinking of refactoring my branch into just package named transmogrifier, which would include just the core, maybe simple Expression and Condition blueprint and replace zope.pagetemplate dependency with Chameleon. It should be also possible to make it Python 3 compatible then. And all c.transmogrifier blueprints should still be compatible. I'll preserve the history for all the code I take from c.transmogrifier.

djay commented 9 years ago

There is already autoinclude support in Mr.migrator and all the funnelweb blueprints using the plugin transmogrify. Please don't invent a new one. I'd rather see Mr.migrator disappear and its features appear in core here. On 9 Nov 2014 01:03, "Asko Soukka" notifications@github.com wrote:

Thanks. I'll check what's in mr.migrator.

Other upcoming thing would be z3c.autoinclude support (with plugin name "transmogrofier"). Also, I'll test, how well venusianconfiguration works outside Plone :)

-----Original Message----- From: Dylan Jay notifications@github.com To: "collective/collective.transmogrifier" < collective.transmogrifier@noreply.github.com> Cc: Asko Soukka asko.soukka@iki.fi Sent: la, 08 marraskuuta 2014 17:41 Subject: Re: [collective.transmogrifier] Remove mandatory dependency from CMFCore (#4)

Mr.migrator is command line for transmogrifier. Doesn't yet include annotation support but does do overrides of values via command line. Also has interesting online help idea which I'd like to replace with something that introspects blueprint docs. It might be cool to merge this into transmogrifier base? On 8 Nov 2014 13:14, "Asko Soukka" notifications@github.com wrote:

If transmogrifier is so great, why not to make it work also outside CMF context?

This pull removes CMFCore dependency so that transmogrifier could be installed without CMF/Plone KGS.

I'm working the next week with a non-Plone (actually, non-Python) migration project, where migration pipeline should from Transmogrifier should be useful.

I'll probably add also some kind of command-line hook/script, which accepts a Python class path as context factory and another argument as executed pipeline. Any other ideas? Should I make all thinks as separate

pull requests (event when they all are based on top of another)?

You can merge this Pull Request by running

git pull https://github.com/datakurre/collective.transmogrifier master

Or view, comment on, or merge it at:

https://github.com/collective/collective.transmogrifier/pull/4 Commit Summary

  • Remove mandatory dependency from CMFCore

File Changes

(14)

(9)

(1)

(32)

(32)

(5)

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4.


Reply to this email directly or view it on GitHub:

https://github.com/collective/collective.transmogrifier/pull/4#issuecomment-62262085

— Reply to this email directly or view it on GitHub https://github.com/collective/collective.transmogrifier/pull/4#issuecomment-62269033 .

datakurre commented 9 years ago

@djay I tried to re-use mr.migrators entry-point, but it does not work with zope.configuration >= 4.0, because "transmogrify" is not a real package:

ConfigurationError: ('Invalid value for', 'package', "ImportError: Couldn't import transmogrify, No     module named transmogrify")

So, mr.migrator's entry-point-name is not compatible with zope.configuration >= 4.0, unless we'd like to depend on completely unrelated https://pypi.python.org/pypi/transmogrify

datakurre commented 9 years ago

So, it seems, I'm not merging mr.migrator to collective.transmogrifier, but

datakurre commented 9 years ago

@djay FYI. datakurre/transmogrifier supports now both transmogrifier and transmogrify z3c.autoinclude packages. I realized that transmogrify worked for you, because you always had at least one package declaring transmogrify namespace package.

The new runner should now have feature parity with mr.migrator with a few changes in execution syntax.

For example, I can run full funnelweb.ttw import with buildout:

[buildout]
extends = http://dist.plone.org/release/4.3-latest/versions.cfg
parts = instance
versions = versions

extensions = mr.developer
sources = sources
auto-checkout = *

[sources]
transmogrifier = git https://github.com/datakurre/transmogrifier

[instance]
recipe = plone.recipe.zope2instance
eggs =
    Plone
    z3c.pt
    transmogrifier
    collective.transmogrifier
    plone.app.transmogrifier
    transmogrify.pathsorter
    funnelweb
user = admin:admin
zcml = plone.app.transmogrifier

[versions]
setuptools =
zc.buildout = 

(Note: a new Plone site cannot be created while funnelweb is in instance script.)

with command

bin/instance -OPlone run bin/transmogrify funnelweb.ttw commit.cfg crawler:url=http://datakurre.pandala.org "crawler:ignore=feeds\ncsi.js" --context=zope.component.hooks.getSite

in detail

commit.cfg is a simple pipeline calling transaction.commit after each item is processed:

[transmogrifier]
pipeline = commit

[commit]
blueprint = transmogrifier.to_expression
modules = transaction
expression = python:modules['transaction'].commit()
mode = items  # run when all items have been yield (here None, because is separate pipeline)
djay commented 9 years ago

Looks pretty cool. I like that we will finally have a transmogrifier that is free of CMF and can be used from the commandline. It should have been done a long time ago. Being able to join two pipelines on the commandline is a nice feature. Did you include the zcml load feature? That was needed because most blueprints don't have autoinclude and you might not be running it inside zope.

The only feature I can think of that you didn't implement is to display help on the blueprint arguments themselves. The way I did it was kind of ugly and used a special markup in the pipeline itself. I think a much better way would be to use a convention in the docstring of the blueprint definition. but perhaps it's not really the most important feature. It did result in useful help such as below.

$ bin/funnelweb --help Usage: funnelweb [options]

Options: -h, --help show this help message and exit --pipeline=FILE Transmogrifier pipeline.cfg to use --show-pipeline Show contents of the pipeline --zcml=ZCML modules in the path to load zcml from

crawler: Crawls site or cache for content

--crawler:url=URL   the top url to crawl
--crawler:start-urls=LIST
                    additional urls to crawl at the start
--crawler:ignore=LIST
                    list of regex for urls to not crawl
--crawler:cache=DIR
                    local directory to read crawled items from instead of
                    accessing the site directly
--crawler:patterns=LIST
                    Regular expressions to substitute before html is
                    parsed. New line seperated
--crawler:subs=LIST
                    Text to replace each item in patterns. Must be the
                    same number of lines as patterns
--crawler:maxsize=BYTES
                    don't crawl anything larger than this
--crawler:max=INT   Limit crawling to this number of pages
--crawler:ignore_robots=BOOL
                    Ignore robots.txt for when you really want their
                    content
--crawler:debug     show extra debug information

itemcache:

typeguess: Sets Plone content type based on mime-type

--typeguess:condition=TAL
                    Tal expression returning boolean called for each
                    'item'
--typeguess:debug   show extra debug information

template1: Provide XPath for title, description, text etc. Specify rules like --template1:title="text //p[1]" --template1:text="html //p"

--template1:debug   show extra debug information
--template1:myfield=FORMAT XPATH
                    A rule to extract content from pages. XPATH must match
                    a node unless FORMAT is "optional". FORMAT of "text"
                    will strip html. FORMAT of "html" will return the
                    matched html.

template2: Used if no previous templates matched. see template1 for options

template3: Used if no previous templates matched. see template1 for options

template4: Used if no previous templates matched. see template1 for options

templateauto: Guesses XPaths of content by performing a cluster analysis of all the content not already matched

--templateauto:condition=TAL
                    A TAL expression returning boolean called for each
                    'item'. Turned off by default.
--templateauto:debug
                    show extra debug information

indexguess: Determines an item is a default page for a container if it has many links to items in that container even if not contained in that folder

--indexguess:condition=TAL
                    tal expression returning boolean called for each
                    'item'
--indexguess:default_pages=LIST
                    names that indication page should be a defaultpage
--indexguess:debug  show extra debug information
--indexguess:min_links=INT
                    If a page has this many links to a single folder's
                    content it will be moved
--indexguess:max_uplinks=INT
                    If a page has more than this many links parent folders
                    then don't more it

sitemapper: Uses a indented html with links in to rearrange those links in the site

--sitemapper:condition=TAL
                    Which item to use as the sitemap
--sitemapper:debug  show extra debug information

drop: Useful to drop certain content

--drop:condition=TAL
                    TAL expression returning boolean called for each
                    'item'
--drop:debug        show extra debug information

attachmentguess: Finds items only referenced by one page and moves them into a new folder with the page as the default view

--attachmentguess:condition=TAL
                    TAL expression returning boolean called for each
                    'item'
--attachmentguess:debug
                    show extra debug information
--attachmentguess:defaultpage=NAME
                    name to give created defaultpages

hideguess: Picks content which won't be shown in the site navigation

--hideguess:condition=TAL
                    TAL expression to pick which items should be hidden

addfolders: --addfolders:default_containers=TYPE Type to set when creating folders --addfolders:debug show extra debug information

titleguess: Tries to find better page titles by analysing backlink text

--titleguess:condition=TAL
                    TAL expression returning boolean called for each
                    'item'
--titleguess:debug  show extra debug information
--titleguess:ignore=LIST
                    don't use backlink text containing these substrings

urltidy: Applies title normalisation rules remove invalid chars from urls. It will also ensure all internal links are corrected

--urltidy:debug     show extra debug information
--urltidy:link_expr=TAL
                    TAL expression to set new value of the path
--urltidy:use_title=TAL
                    TAL expression to switch id to use the title
--urltidy:invalid_ids
                    Rename the reserved words by Plone link_expr =
                    python:item['_path'].rsplit('.',1)[-1] in
                    ['html','asp','php'] and
                    item['_path'].rsplit('.',1)[0] or item['_path']

changetype: Switch the type of the created object if desired

--changetype:value=TAL
                    TAL expression to give the new value for the Type of
                    object.

ploneupload: Adds content to plone via xmlrpc

--ploneupload:target=URL
                    The base url for where all content should be created.
                    Can support basic authentication e.g. target =
                    http://admin:admin@localhost:8080/Plone
--ploneupload:debug
                    show extra debug information
--ploneupload:skip-until-path=STRING
                    won't update anything until it reaches this path

ploneupdate: Updates content of existing object on a remote plone site via xmlrpc

--ploneupdate:target=URL
                    the base url for where all content should be updated.
                    Can support basic authentication
--ploneupdate:skip-unmodified=BOOLEAN
                    if true the modification date will be compared with
                    that on server and updating skipped
--ploneupdate:skip-until-path=STRING
                    won't update anything until it reaches this path
--ploneupdate:skip-fields=LIST
                    don't update these fields during update
--ploneupdate:skip-existing=BOOLEAN
                    if creation-key is set then update, otherwise skip
--ploneupdate:debug
                    show extra debug information

ploneportlets: Sets left and right portlets

--ploneportlets:target=URL
                    the base url for where all content should be updated.
                    Can support basic authentication
--ploneportlets:debug
                    show extra debug information

plonehide: Hide items from the navigation (hints to which items should be hidden are set earlier in pipeline) by default it will hide items not linked to outside of any body text

--plonehide:debug   show extra debug information

publish: Set the workflow transition

--publish:value=TAL
                    TAL expression to return the transition to workflow

plonepublish: Publish or otherwise change the workflow state of remote plone content

--plonepublish:debug
                    show extra debug information
--plonepublish:skip-until-path=STRING
                    won't update anything until it reaches this path

plonealias: Creates aliases for items that have moved

--plonealias:skip-until-path=STRING
                    won't update anything until it reaches this path

ploneprune: Delete objects which are on the remote site, but not in local copy

--ploneprune:condition=TAL
                    TAL expression for which folders to remove old content
--ploneprune:debug  show extra debug information
--ploneprune:trash  folder to move pruned items (instead of delete)

localupload: Save transformed site locally

--localupload:output=DIR
                    directory to load transformed content into for
                    debugging
--localupload:debug
                    show extra debug information
datakurre commented 9 years ago

Dylan Jay wrote:

Being able to join two pipelines on the commandline is a nice feature.

Currently it just executes them in serial, but does pass items from the previous pipeline to the next one. That would be possible with a few changes into Transmogrifier executor, but I had (non-Plone) use cases, where that was not the wanted behavior.

Did you include the zcml load feature?

Not yet. How about

[transmogrifier] require (or include?) = package.name package.name

instead of yet another command line argument. (I've been thinking if also --context-argument should be read from transmogrifier:context instead).

I think a much better way would be to use a convention in the docstring of the blueprint definition.

Supporting docstring sounds like a good idea would encourage to write them.

-Asko

datakurre commented 9 years ago

Asko Soukka wrote:

Did you include the zcml load feature?

@djay I forgot that this was dropped, because in Plone context it's now enough to add zcmls into instance-parts zcml-list (like already done for plone.app.transmogrifier in the example).

[instance] recipe = plone.recipe.zope2instance eggs = ... zcml = ...

-Asko

djay commented 9 years ago

Only if you are running inside zope. Which I don't do and Mr.migrator was designed not to have to do. I like the idea putting zcml in the pipeline.

datakurre commented 9 years ago

Just adding support for --include=package (or --include=package:filename.zcml) was so much easier that I did it. Yet, I renamed it from --zcml to just --include (which is the actual zope.configuration API call here), because I'm still dreaming of merging my Python configuration syntax into zope.configuration in far future.

I also looked into passing items from one pipeline to another. Not too difficult, but make the new transmogrifier incompatible with the old one. I'd prefer to keep the current way of "just executing pipelines in serial, but separate" and add a built-in blueprint for executing a named pipeline so that it's trivial to make pipelines from pipelines. Like splitter blueprint, but simpler.