Adobe-Consulting-Services / acs-aem-commons

http://adobe-consulting-services.github.io/acs-aem-commons/
Apache License 2.0
453 stars 600 forks source link

Scheduled content importer tool #216

Closed HanbingLiu closed 8 years ago

HanbingLiu commented 10 years ago

Some projects need to migrate the whole site to CQ, some components are designed as rich text editor to render the html hosted on original server. They may need to update the content regularly. We can provide a tool, which can be scheduled to update all the content.

justinedelson commented 10 years ago

@HanbingLiu is this describing something along the lines of the static replication agent, but scheduled and covering the full site?

davidjgonzalez commented 10 years ago

This sounds more like a polling job in AEM that reaches out to some 3rd party system, gets content, and writes it back into resources under AEM pages that are editable (in AEM) via RTEs... I think?

Maybe not a recurring job? Not sure what "scheduled' means in this context.

HanbingLiu commented 10 years ago

@justinedelson @davidjgonzalez Yeah, David is correct. we want to reach out to some 3rd party system and get content. "Scheduled" means this could be a recurring job, in some cases, authors update the content from another system, so we need to regularly update the content into AEM. To check the modified time of the file and compare with the last modified time of the node. They may also need a list to see after running the job, which nodes are updated.

davidjgonzalez commented 10 years ago

@HanbingLiu can you go into a little more detail how this would work? How would a page (or component?) know where to pull the data from? Is it always a HTTP request? Or does it look at the Filesystem? Would this recurring job run on Author, or Publish? etc.

HanbingLiu commented 10 years ago

@davidjgonzalez it's a requirement being discussed in a current project. Here is the detailed description. • a CQ Component contains a path to a (Madcap HTML) file that resides somewhere on the filesystem (perhaps a mounted unix filesystem; maybe we can access via http, if they can provide that) • The CQ Component can read that filesystem path and load the (X)HTML content into the CQ node, and save it. • Periodically, an author may request that the CQ Component re-reads the content from that same path, to “refresh” the content, reloading it into the same CQ component. • The Component renders the HTML like an ordinary Rich Text component. • We’d like to invoke all of this within CQ, from within the Components, or services in CQ, but not from an external program, if possible.

The solution we're talking about is to create a component, which has a path field, either points to a file system path or a http:// path, and a RTE, which will store the content got from the given path. The first step is to create a load button to manually refresh the content when needed, the second step is to write a scheduler, when it is triggered, first search out all the nodes needed to be updated, then according to the protocol, either read it from file or from http request. Then if customers want, maybe we can provide a list for the updated list based on the last modified date. This would run on author, when the pages are ready, publish them.

davidjgonzalez commented 10 years ago

@HanbingLiu This sounds similar-ish to the CQ5.5 external component (though, that would make a real-time HTTP calls back to surface other content). Few more general thoughts

Just trying to think of this feature in the context of a more general use case.

justinedelson commented 10 years ago

For what is being described here, you should be using a polling importer. If someone wants to contribute an importer which gets fed a file URL and simply takes the content of that file and saves it to a node, that would be fine with me.

pdimarco commented 10 years ago

I just want to add to this as another client could use this feature. They use ATG for some common content and CQ has to hit that server to create part of its content on pages. However they want that hit to happen only when ATG has updated content, and keep the page cached in the interim.

This also has to do with Flushing the dispatcher whenever that ATG content is updated. The issue with using ATG to tell CQ the content is new, is that ATG doesn't know which pages in CQ use the content and this makes Flushing of dispatcher file difficult as well as refreshing the appropriate CQ page content. That said, having this component that is proposed, would be a solution to this challenge because the page that has this component can trigger the Flush.

From: justinedelson notifications@github.com<mailto:notifications@github.com> Reply-To: Adobe-Consulting-Services/acs-aem-commons reply@reply.github.com<mailto:reply@reply.github.com> Date: Thursday, May 22, 2014 at 8:28 AM To: Adobe-Consulting-Services/acs-aem-commons acs-aem-commons@noreply.github.com<mailto:acs-aem-commons@noreply.github.com> Subject: Re: [acs-aem-commons] Scheduled content importer tool (#216)

For what is being described here, you should be using a polling importer. If someone wants to contribute an importer which gets fed a file URL and simply takes the content of that file and saves it to a node, that would be fine with me.

Reply to this email directly or view it on GitHubhttps://github.com/Adobe-Consulting-Services/acs-aem-commons/issues/216#issuecomment-43887428.

justinedelson commented 10 years ago

I don't understand the references in this thread to an RTE. If the imported content is editable through an RTE widget, how do you plan on ensuring that the edited content doesn't get overwritten during the next import?

davidjgonzalez commented 8 years ago

Use polling importer; the rest of the use-case does not seem generic enough for ACS Commons