alfredbaudisch / pardall_markdown

Reactive publishing framework, filesystem-based with support for Markdown, nested hierarchies, and instant content rebuilding. Written in Elixir.
Apache License 2.0
116 stars 8 forks source link

feat(lib): add support for git repositories as data source #49

Closed rockchalkwushock closed 3 years ago

rockchalkwushock commented 3 years ago

What does this PR do?

Adds support for users to use git repositories as a data source for PardallMarkdown.

Example

config :pardall_markdown, PardallMarkdown.Content,
  root_path: "https://github.com/rockchalkwushock/codybrunner.dev.git",

Closes #31

alfredbaudisch commented 3 years ago

This is a very good start, thanks for sharing. We still have some work to do, the repository provider has to be connected to a watcher, the watcher also needs to pool the repository (as opposed to the current FileWatcher which detects file changes on the fly), and then when getting new content, parse it with the FileParser through the current PardallMarkdown workflow.

blogit/server.ex has a good start for a Repository watcher. It needs to be adjusted into PardallMarkdown but the main core is there.

Our current FileWatcher and the new RepositoryWatcher can be loaded and started side by side. Then, RepositoryWatcher could clone the files into root_path and then FileWatcher picks up the changes (as it currently is). By following this, RepositoryWatcher won't have to talk to FileParser and we can rely on the current implementation.

Do you want to give it a try? Otherwise I can do this next step in the weekend.

rockchalkwushock commented 3 years ago

Hi @alfredbaudisch I'd like to take another shot at implementing the RepositoryWatcher. I likely won't get a chance until tomorrow evening or Thursday evening at the earliest. I should have a rough implementation up before this weekend though.

Thank you for your guidance and patience. It's challenging, but also exciting to be contributing to an Elixir project! Even more so a project I am absolutely going to use in the coming weeks for a side project!

rockchalkwushock commented 3 years ago

@alfredbaudisch I am working on the repository_watcher now. My assumption is that when you mean the watchers can be started "side-by-side" the GenServers will be executed in the application.ex supervisor tree. I apologize if this gets trivial my OTP is pretty noob level.

# application.ex
def start do 
  children = [
    ...
    # Start the repository_watcher.ex first
    # How does the GenServer respond if no config value provided? 
    # Do we just not invoke the watcher?
    # Would it make sense to look for the config here an provide it to the keyword list?
    # { ..., remote: Application.get_env(:pardall_mardown, :remote_source) }
    {PardallMarkdown.RepositoryWatcher, name: PardallMarkdown.RepositoryWatcher},
    {PardallMarkdown.FileWatcher, name: PardallMarkdown.FileWatcher, dirs: [PardallMarkdown.Content.Utils.root_path()]]},
    ...
  ]
  ...
  Supervisor.start_link(children, opts)
end

With PardallMarkdown running as an application inside the mix project if a remote_source is provided the @local_path in the git module needs to be an absolute path "relative" to the environment (local machine or container) and that becomes the root_path for FileWatcher and FileParser to take over from correct?

Now it is my understanding that when a process is created on the beam those processes are isolated from one another so I am slightly confused as to how the FileWatcher is going to be able to listen for changes on RepositoryWatcher. I unfortunately have not had time to work on this side project much any more, but this post is where my understanding of GenServer and OTP stopped.

From your YT video PardallMarkdown is "framework agnostic". You don't have to use it with Phoenix. It can be used in a standalone mix project with no problem so adding something like the Phoenix.PubSub dependency I would assume is not something you would want to do here since that (assumption) would lock the package into Phoenix and require more dependencies. I am not sure in the ecosystem or internally what Elixir provides for passing state between two isolated processes (or perhaps there is an underlying Erlang module that can help with this?)

RepositoryWatcher POC

defmodule PardallMarkdown.RepositoryWatcher do
  use GenServer
  require Logger

  alias PardallMarkdown.RepositoryProvider, as: Repository

  @recheck_interval Application.get_env(:pardall_markdown, PardallMarkdown.Content)[:recheck_pending_remote_events_interval]

  def start_link(args) do
    GenServer.start_link(__MODULE__, args, args)
  end

  def init(args) do
    # Fetch the repository or create a local git repo if remote source not found.
    Repository.repository()

    # Is it possible to update the config at runtime so the `root_path` is updated?
    # If not how does the FileWatcher subscribe to the path being created here?
    # My assumption is that if a `remote_source` is provided and we are executing in Mix.env() == :dev
    # the local_path is pointing to the local git repository and any polling of the remote is disabled.
    # When executing in Mix.env() == :prod then the callbacks associated with listening on the remote are enabled.

    # Start Polling
    schedule_next_recheck()

     # Send out message on the initialized state of the GenServer.
     {:ok, %{}}
  end

  def handle_info() when Mix.env() == :dev, do: # do things with the local git repository "FileWatcher"
  def handle_info() when Mix.env() == :prod, do: # do things with remote_source "FileWatcher + RepositoryWatcher"

  defp schedule_next_recheck, do: Process.send_after(self(), :check_pending_events, @recheck_interval)
end

Other Thoughts Not Relevant To the Above 😂

Also just a thought, but I would figure that this feature could extend past just git providers. For example CMS providers like Contentful et.al. Perhaps this should be renamed to remote_provider and remote_watcher. Under the remote_providers/ there can be implementations for git as well as what I am sure would require specific modules for each CMS.

Thank you for your patience with a noobie 🙏🏻

alfredbaudisch commented 3 years ago

Nice questions, @rockchalkwushock.

so adding something like the Phoenix.PubSub dependency I would assume is not something you would want

It's because it's not necessary.

I am slightly confused as to how the FileWatcher is going to be able to listen for changes on RepositoryWatcher

I would figure that this feature could extend past just git providers.

Yes, but then we would implement something like webhooks, so the headless CMS can call PardallMarkdown webhooks and we refresh the content. But this is separate from implementing the Git provider. We can also always refactor later.

Your OPT remarks in how to start the RepositoryWatcher

Use the same format as FileWatcher is started, passing configuration at runtime, either via a helper function or directly via Application.get_env. Put RepositoryWatcher after FileWatcher.

      {
        PardallMarkdown.FileWatcher,
        name: PardallMarkdown.FileWatcher, dirs: [PardallMarkdown.Content.Utils.root_path()]
      }

Is it possible to update the config at runtime so the root_path is updated? If not how does the FileWatcher subscribe to the path being created here?

I think this is answered above, but let's explain the order of execution of a PardallMarkdown application:

Fresh Start

Lifecycle

rockchalkwushock commented 3 years ago

@alfredbaudisch sorry for the late reply I am just now seeing this. I will see if I can get around to writing the implementation tomorrow for this.

rockchalkwushock commented 3 years ago

@alfredbaudisch I think I've just about got it where it needs to be. There are a few known bugs at the moment

  1. RepositoryWatcher will crash if the local_path already exists. Pretty easy condition check to fix this, but my question for you is should that check live in the Git.git_repository/0 function or should that check be made inside the RepositoryWatcher.init/1?
  2. I am still trying to manually mock that the fetch polling is working.
alfredbaudisch commented 3 years ago

Thank you! Gonna check it out by the weekend.

alfredbaudisch commented 3 years ago

@rockchalkwushock I committed the final changes needed to make the feature work. My changes also answer your last questions.

I greatly simplified repository_providers/git.ex, and added the necessary fixes/additions to make it correctly work with :root_path and with the existing path/non-existing conditions.

RepositoryWatcher is loaded and started only in case a remote_repository_url is provided.

FileWatcher correctly ignores .git files and do not send files to be rebuilt in case of no changes in the content itself.

Do you want to add something more? If no, then it is ready to be merged.

The CI build fail is not related to this feature, so we can ignore it.

rockchalkwushock commented 3 years ago

@alfredbaudisch looking at this now

rockchalkwushock commented 3 years ago

@alfredbaudisch everything looks great! Good catch with the hidden files I had not considered that. I also was racking my brain with how to conditionally start the RepositoryWatcher if no remote_url was provided. My JavaScript brain kept saying if/else...still learning how to think the "Elixir Way". Thanks for all your help with getting this to the finish line.

Merge when ready

alfredbaudisch commented 3 years ago

@rockchalkwushock well done. Published https://hex.pm/packages/pardall_markdown/0.4.0 - you are also mentioned in the README and the README has instructions regarding to this new feature.