Generate sqlite output from RakuDoc source input

dontlaugh commented 8 months ago

This issue is spawned from #75

The objective here is to create a schema and populate it with the data attributes parsed from https://github.com/Raku/doc

The resulting sqlite database could be used to power relational queries, such as all classes that implement a role, or listing all methods on a class and whether they come from roles, etc.

A sqlite database can also be a potential intermediary format for building the website itself. E.g. the middle part of the Collection framework's pipeline could be simplified or extended if we can consistently parse our pod6 files into a relational database.

Somewhat related issues:

finanalyst commented 8 months ago

The first step - I suggest - would be to create an routines.sql file from the data gathered to generate the routines page.

patrickbkr commented 8 months ago

Thinking big, this could also be a good starting point for a rakudoc commandline doc browser.

finanalyst commented 8 months ago

@patrickbkr Lets see how the first step goes, but if routines works well, then we can put all the data from the Search function into an sql next. And then locally maybe have a Cro app running so that sql return data in the form of http links to individual pages are served locally.

dontlaugh commented 8 months ago

This full text search module is available: https://sqlite.org/fts5.html I've never used it, myself.

dontlaugh commented 6 months ago

If - during the build - we can identify hyperlinks (whether relative or external), it would be useful to toss them into a url table. This can be used to drive an automated link-check test. Refs https://github.com/Raku/doc/issues/4476

That's just one use case. There are many. I'd like to start re-architecting the build to pivot around a normalized sqlite database. Here is a diagram.

sqlite

If we pull this off, the advantage will be that we can decouple each of the downstream processes that depend on the parsed data (A, B, C, D in the diagram). SQLite supports multiple processes reading simultaneously, so we can run them in parallel to speed things up significantly. I also think it will be easier for contributors (but I could be wrong, who knows).

The challenge is that the "DB Creation script", the build process entry point, becomes more complex. Effectively this component is a compiler from rakudoc to SQL statements.

rakudoc -> intermediate representations (IR) -> SQL statements

In principle, if we capture and normalize all the rakudoc source material, we can drive any downstream use case: static websites, man pages, tests, offline search.

finanalyst commented 6 months ago

@dontlaugh This is a very significant change. Honestly, I haven't thought of the build process in this way, and it will take me a while to think through how to do this. I think your rakudoc -> internal rep -> SQL normalisation is rather more difficult underneath than it seems. Currently, I am implementing a renderer for RakuDoc v2. It should be noted that RakuDoc v1 was never properly implemented. The new renderer works directly with the AST of the source files, and not with a compiled variable $=pod. The practical result is that the AST is produced about 6x faster than the compiled $=pod. In addition, we may be able to eliminate the 'caching' step, which can take about five minutes. However, the AST representation of each source file could possibly be the sort of internal representation you are looking for.

dontlaugh commented 6 months ago

the AST representation of each source file could possibly be the sort of internal representation you are looking for

It sounds like it. I imagine that the complex program on the left hand side would be best implemented by a library that works with a proper AST.

The data we need in normalized form will look different than an in-memory AST, but as long as each AST node can be serialized as text or bytes, we can store both in different database tables.

The new renderer works directly with the AST of the source files

I am pleased to hear that. What library is parsing the rakudoc into an AST? Can you link to it? Even if it is in nascent stages.

This is a very significant change.

I agree. This is in the high-effort, but (potentially) high-reward category.

finanalyst commented 6 months ago

@dontlaugh the new Rakudo compiler creates AST for all programs and the AST can be manipulated. Although the Rakudo AST compiler has not yet completely landed - this will be the raku.e milestone - there is sufficient for RakuDoc, and that has been backported into raku.d.

As an example, if you take a recent version of raku, eg 2024.04, and run the following in a terminal assuming the current directory is a local clone of Raku/docs/doc/Language/, you will get the AST of the file

raku -e 'say  "101-basics.rakudoc".IO.slurp.AST'

The new bit is the .AST method which returns the AST of the input string.

dontlaugh commented 5 months ago

Saw this in Rakudo Weekly: The Graph package may prove useful here https://raku.land/zef:antononcube/Graph

Raku / doc-website

Generate sqlite output from RakuDoc source input #359