IBM / data-prep-kit

Open source project for data preparation of LLM application builders
https://ibm.github.io/data-prep-kit/
Apache License 2.0
163 stars 110 forks source link

Github doc pages hyperlinks and formatting issues #43

Closed nirmdesai closed 2 months ago

nirmdesai commented 5 months ago

The hyperlinks on this doc page ends up downloading the .py files instead of navigating the browser to the specific code file in the repo: https://ibm.github.io/data-prep-lab/data-processing-lib/doc/overview/

Same issue exists for this page and probably other such pages: https://ibm.github.io/data-prep-lab/data-processing-lib/doc/transform-tutorials/

Numbered list on this page is not correctly formatted, and the powerpoint diagram looks weird with grammar mistake highlights: https://ibm.github.io/data-prep-lab/data-processing-lib/doc/architecture/

shahrokhDaijavad commented 5 months ago

@nirmdesai Pages created via MkDocs need manual fixing of links with relative paths. We are aware of this and I have asked Shivdeep to do this.

daw3rd commented 5 months ago

And to be clear, these links work when viewing from github.com, just seems mkdocs is doing something wrong. For example, the .py links work as expected from https://github.com/IBM/data-prep-lab/blob/dev/data-processing-lib/doc/transform-tutorials.md

shahrokhDaijavad commented 4 months ago

@shivdeep-singh-ibm Have you looked at this and found no solution yet? If there is no solution for referring to python pages in transforming repo with MkDoc to Pages, we should link to Readme pages in the respective directories. As for the formatting issue (the third link above), I think there should be a way to fix this, no?

shivdeep-singh-ibm commented 4 months ago

I have found 1 way. I am preparing a patch for it. That method is working for python cases, trying to handle some corner cases as well.

The approach is to use hooks.py as a hook to mkdocs , which will automatically update the links (relative links to python files or relative links to repo folders), with. absolute github links to repo on the fly while generatig the documentation.

eg. [transform](./transform/src/main.py) will become [transform](https://github.com/IBM/data-prep-lab/blob/dev/transform/src/main.py) this way it will open github on clicking the link.

I need to support only

shahrokhDaijavad commented 4 months ago

Sounds good, @shivdeep-singh-ibm! Thank you.

shahrokhDaijavad commented 4 months ago

I see that the link to the python files has been fixed, but the formatting issue in the page https://ibm.github.io/data-prep-kit/data-processing-lib/doc/architecture/ is still there.

shahrokhDaijavad commented 4 months ago

@shivdeep-singh-ibm Thanks for making the file a lot better by adding new lines. Sorry for nitpicking, but there is still a problem with the indention of bullets and sub-bullets, as I compare the repo Readme with the corresponding Pages version of the architecture.md file. As I look at the markdown file, I see a different color of for bullets and sub-bullets. Repo treats this correctly, but Pages doesn't. I think the sub-bullets that are red color should become black for the pages to work properly.

Bytes-Explorer commented 4 months ago

@shivdeep-singh-ibm @shahrokhDaijavad Is this done? Can it be closed?

shahrokhDaijavad commented 4 months ago

@Bytes-Explorer and @shivdeep-singh-ibm . This is mostly done. The problem is still with the indentation of sub-bullets in this page: https://ibm.github.io/data-prep-kit/data-processing-lib/doc/architecture/) (compare with this page: https://github.com/IBM/data-prep-kit/blob/dev/data-processing-lib/doc/architecture.md in which sub-bullets in the Ray Orchestrator section and Data Access under core components are not indented correctly. I don't know if there is a solution for this (maybe adding a return after the corresponding lines in the md file?) It is not a big issue, if there is no solution and we can close it.