cjcodeproj / medialibrary

Python code to read XML media files
MIT License
2 stars 0 forks source link

Output format framework #178

Open cjcodeproj opened 4 months ago

cjcodeproj commented 4 months ago

For the tools that generate output reports, most of the processing is based on loading and sorting the data. The output part is easy, and most of that code is in other modules.

Consider an API framework that abstracted the output even further; so a tool like media.tools.movies.list could handle multiple output formats with very little change to the code base.

cjcodeproj commented 4 months ago

Additional Notes

cjcodeproj commented 2 months ago

More Notes

Code should work something like this.

formatter = FormatterClass.new(Style.TEXT)
formatter.format(Output.LIST)
output = formatter.headers()
output += formatter.format_batch(in_batch)
output += formatter.close()
print(output)

There could be other methods, but the idea is the code could output either plain text or HTML without any code change at the application level. The code should also automatically change it's behavior based on the type of content that is being output.

cjcodeproj commented 2 months ago

More Notes

High Level Notes On How A List Tool Currently Works

Based on the code in media.tools.movies.list and media.tools.audio.albums.list

How The Formatter Should Be Integrated, and how it should work

cjcodeproj commented 2 months ago

Matrix

Output Format Object Class Report Type
text Movie List Entry
html Album Mini List Entry
markdown Song Detailed Entry
Essay

The trick will be to make the code extensible to handle all types of content, flexible to handle all output formats, but also still support a polymorphic interface.

The different output formats will probably be handled by some kind of Driver level class, where the formatter references the driver, and each driver adheres to a similar protocol for output calls.

cjcodeproj commented 2 months ago

More Notes

The output format driver should be the dumbest code of them all. No knowledge of the data it's outputting, just helper functions that are designed to pad and alter the values into suitable output.

We're taking things for granted, like alignment/justification, and column widths (which just magically work when you're dealing with tables). Also things like plain text output using fixed width fonts, which will become an issue when formats like HTML come into play.

cjcodeproj commented 2 months ago

Notes: Tables

The driver building a table should take two parameters

  1. A list of all column headings (should also allow for empty columns)
  2. A format spec that covers justification/alignment and size.
cjcodeproj commented 1 month ago

Notes: Options/Flexibility

There are 3 output formats supported in the test code: plain text, CSV, and HTML. Each driver outputs a table with a list of movies.

There are pros and cons of each system. Plain text has a fixed table size, where a HTML table adapts to the field length. Field width is determined by the number of characters, but HTML has options for widths using values like inches, ems, pixels, etc, etc. But on the other hand, HTML doesn't natively support an output of a fixed width floating point value.

There should be a mechanism where drivers have feature flags that can identify things that the driver is capable of.

There should also be features to do things like indent the table output by 10 character positions.

cjcodeproj commented 1 month ago

Notes: Section heads and HTML element body

Tables where rows are grouped into batches should make use of the HTML <tbody> element. The add_row() method should probably be supplemented by a add_row_batch() method to accommodate this.

The table should have at least one <tbody> element, regardless of the structure of the rows, so the code will need to keep track of the rows as they are added. It should probably be a simple method that counts the rows as they are added.

cjcodeproj commented 1 month ago

Notes: OOP and Protocols/Interfaces

Right now all of the drivers have an identical class and method structure, but there is almost no shared code between them. If the code base was Objective-C, Swift, or Java the classes would all adhere to a protocol specification.

Protocols don't come into play in Python unless you're also doing typing. It should be a future consideration to create protocol definitions when typing is implemented.

cjcodeproj commented 1 month ago

Test Code

Test code implementation currently looks like this:

from media.fmt.driver.generic import TableColumnSpec, TableColumnAlign
from media.fmt.driver.selector import Selector

driver = Selector.load_driver('text')
table = driver.get_table()

table.add_column(TableColumnSpec('Title',20))
table.add_column(TableColumnSpec('Length',10))
table.add_column(TableColumnSpec('Genre',15))

table.start()
table.headers()
table.add_row('Condorman','1:00:00','Action')
table.add_row('Snake Movie','1:01:00','Drama')
table.add_subhead('New movies')
table.add_row('Catch The Last Train','2:00:00','Western')
table.add_row('Saddlebag Full Of Bullets','1:30:00','Western')
table.finish()

print(table.output,end='')

There are output mechanisms for plain text, HTML, and CSV formats. The markdown format was dropped because Markdown can support embedded HTML, and the syntax isn't flexible enough to handle some use cases. In this code example, changing the output format only requires a change to a single line of code.

cjcodeproj commented 1 month ago

Notes: Where is the output cached?

There are 3 layers to the code.

  1. The driver layer that does the raw work of building a table
  2. The middle layer that organizes the movie data
  3. The application that calls the middle layer, and passes the movie data.

Right now the driver layer caches the data, and returns it all as a single string. But should it? Or should the middle layer capture all the data and return it as a single string.

If we want to keep the drivers simple, then the output should be preserved at the middle layer. Does the driver layer need to maintain state? It could be helpful to track the number of rows, but not 100% sure if it's needed.

On the other hand, rendering a HTML table, there is a need to track column and row information when it comes to things like cell or header id. Also, if the table ever has a <tfoot> block, it's important that it follows the <thead>, but precedes the <tbody>, because it's a requirement for rendering. So, the driver needs to know what the entire table is like in order to get those elements output in the right order.

cjcodeproj commented 1 month ago

Coding Notes:

The following sample code (not committed) can generate a full HTML list of movies.

#!/usr/bin/env python

# Test program to output a list

import media.fileops.repo
from media.generic.sorting.organizer import Organizer
from media.generic.sorting.batch import Batch
from media.fmt.content.movie.list import TableList

repo = media.fileops.repo.Repo('/home/chrisj/xml/m/internal-db')
repo.scan()
repo.load()

print(f"<!-- {len(repo.media)} -->")

movies = repo.get_movies()
organizer = Organizer(movies)
batches = organizer.create_batches(None)

print(f"<!-- {len(batches)} -->")

tl1 = TableList()
tl1.setup('html')
tl1.batch(batches[0])
print(tl1.get_output())

The middle layer object is the TableList class which organizes the data, and then uses a driver class just for generating table HTML code.

cjcodeproj commented 2 days ago

Notes: Big Changes

Tables are objects, but they have no output formatting functionality.

There are separate formatters for plaintext, HTML, and csv output.

One table can be passed to multiple formatters.

Sample code

from media.fmt.structure.table import Table, TableColumnSpec
from media.fmt.formatter.selector import Selector

html_formatter = Selector.load_driver('html')
html_table = html_formatter.get_table()

pt_formatter = Selector.load_driver('plaintext')
pt_table = pt_formatter.get_table()

csv_table = Selector.load_driver('csv').get_table()

t = Table()

t.add_column(TableColumnSpec('Title',20))
t.add_column(TableColumnSpec('Length',10))
t.add_column(TableColumnSpec('Genre',15))

t.start()
t.add_row('Condorman','1:00:00','Action')
t.add_row('Generic Snake Movie','1:01:00','Drama')
t.add_body()
t.set_body_header('New movies')
t.add_row('Catch The Last Train','2:00:00','Western')
t.add_row('Saddlebags Full Of Danger','1:30:00','Western')
t.finish()

out = html_table.render(t)
out2 = pt_table.render(t)

print(out)
print(out2)
print(csv_table.render(t))

Classes under the structure package contain the data. Classes under the formatter package handle the output.