dimitri / pgcopydb

Copy a Postgres database to a target Postgres server (pg_dump | pg_restore on steroids)
Other
1.19k stars 78 forks source link

Store timelines in internal catalogs #848

Closed hanefi closed 3 months ago

hanefi commented 4 months ago

This is a WIP PR that still has some rough edges.

TODO:

dimitri commented 3 months ago
  • It does not look acceptable to include "schema.h" or "catalog.h" in file pgsaql.c. I'll try to find a workaround

Usually that involves creating a new C module that will include all the needed headers. Here, that would be something like a “pgsql_timeline.c” module I suppose.

  • Missing some catalog functions that can be used to populate data structures on current timeline. I do not see the value in storing a value that is not used anywhere.

At the moment we fetch the timeline mostly for DEBUG purposes. Supporting a TLI change in Logical Decoding is still an open topic in pgcopydb. We need to have enough information to decide if we can follow a TLI change, depending on what we replayed already in the past etc. So for the moment, we just store the information in a way that's easy to review/debug, but we not use the information anywhere yet.

dimitri commented 3 months ago

See #834

hanefi commented 3 months ago
  • It does not look acceptable to include "schema.h" or "catalog.h" in file pgsaql.c. I'll try to find a workaround

Usually that involves creating a new C module that will include all the needed headers. Here, that would be something like a “pgsql_timeline.c” module I suppose.

Thanks for the idea of a new module. Here are some information on how I created the new module:

  • Missing some catalog functions that can be used to populate data structures on current timeline. I do not see the value in storing a value that is not used anywhere.

At the moment we fetch the timeline mostly for DEBUG purposes. Supporting a TLI change in Logical Decoding is still an open topic in pgcopydb. We need to have enough information to decide if we can follow a TLI change, depending on what we replayed already in the past etc. So for the moment, we just store the information in a way that's easy to review/debug, but we not use the information anywhere yet.

I only store the current timeline details in memory now. After moving some portion of code to a new module, some of those changes may escape the attention of the reader/reviewer.

hanefi commented 3 months ago

I am surprised we need to define our own iterator. Expected that a callback to the file line iterator would do.

I had 2 main issues that were not easy to overcome if I used the current file line iterators, and file_iter_lines function.

Problem 1:

When parsing a line, I needed to store some values in the iterator, and it was not possible with the current file iterator. I believe we need to be able to pass custom iterators for that to be useful.

I considered updating the current implementation of file_iter_lines to accept an iterator instead of some fixed set of arguments that are used to construct the iterator in the function body. That would have worked out for me.

Problem 2:

I need to be able to call the callback function after all the lines are processed. This way, we are able to add a final timeline entry for the current timeline. This was not easy to accomplish in the current implementation of file_iter_lines.


Below is a comment from @dimitri as he mistakenly edited the original comment instead of quoting it in a new comment:

I had 2 main issues that were not easy to overcome if I used the current file line iterators, and file_iter_lines function.

It seems to me that there is still confusion about the responsibilities of the iterator parts of the code, which should only know how to read a line at a time, and the callback/context side of things, where I expected we would deal with prevend and all and have all the needed information for the last entry, after the iterator is finished.

It might be my confusion though, so I will have another look later today and maybe try to actually write it the way I though. Sometimes it's the only way to understand the difference between ideas and reality...

dimitri commented 3 months ago

Thanks for pushing all the work Hanefi, I like what we ended-up with!