Open pitrson opened 3 months ago
Yeah, this was doing me head in the other day. I would get it right if I was in a folder at x deep and then it would break if I was further up or down from that level.
The version before worked at http://localhost:8000/level1/. This current one works at http://localhost:8000/level1/level2. As you say above.
I'm open to some ideas. I could work on this in a few weeks.
I was able to get this working with my primitive script (I'm not really experienced with Python ^^) based on os.relpath
problem described at stackoverflow . I'll try to compose a pull request, but if I'm not able to manage, I'll share the logic, so that you can hopefully implement it:
debug to know the files and their tags:
defaultdict(<class 'list'>, {PosixPath('docs/index.md'): ['index_test'], PosixPath('docs/test2/test2_2/index2.md'): ['index_test', 'group_2'], PosixPath('docs/test1/index1.md'): ['index_test', 'group_1'], PosixPath('docs/test1/index2.md'): ['index_test', 'group_1']})
and generated links (I've implemented a check to exclude link generation for 'self' so that the page the links are generated on doesn't include link to itself as it is useless:
Generating links on page docs/index.md with pagelist arguments ['index_test', 'group_1']
Creating link for matched file located at docs/test1/index1.md
Link is test1/index1.md
Creating link for matched file located at docs/test1/index2.md
Link is test1/index2.md
Generating links on page docs/test2/test2_2/index2.md with pagelist arguments ['index_test']
Creating link for matched file located at docs/index.md
Link is ../../index.md
Creating link for matched file located at docs/test1/index1.md
Link is ../../test1/index1.md
Creating link for matched file located at docs/test1/index2.md
Link is ../../test1/index2.md
Generating links on page docs/test1/index1.md with pagelist arguments ['index_test']
Creating link for matched file located at docs/index.md
Link is ../index.md
Creating link for matched file located at docs/test2/test2_2/index2.md
Link is ../test2/test2_2/index2.md
Creating link for matched file located at docs/test1/index2.md
Link is ./index2.md
Cool thanks @pitrson. If you could do a pull request that would be great. Or share the logic you used. Thanks
Hey @alanpt
I'm sharing my primitive script, most of it is probably not useful for you, but I hope you can implement the important part which is a realrelpath
function and then for loop at the end. I may submit a PR in upcoming weeks if I find some time (or if you're not faster ^^). Just adjust the docsdir
to point to your mkdocs docs location to test.
PS. I'm not a python developer.
from collections import defaultdict
import frontmatter
import re
import os
# get all md files
md_files = [ ]
from pathlib import Path
docsdir = 'docs'
pages_tags = defaultdict(list)
pages_pagelist_args = defaultdict(list)
for p in Path( docsdir ).rglob( '*.md' ):
md_files.append(p)
data = frontmatter.load(p)
#populate dict with page[tags]
for tag in (data['tags']):
pages_tags[p].append(tag)
print(pages_tags)
# get files with pagelist
import mmap
pagelist_files = [ ]
for md in md_files:
with open(md, 'rb', 0) as file, \
mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as s:
if s.find(b'pagelist') != -1:
pagelist_files.append(md)
#get pagelist arguments
for pgfile in pagelist_files:
f = open(pgfile, 'r')
pgargs = re.findall(r'\{(pagelist.*?)\}', f.read())
pgargs = ','.join(pgargs)
pgargs = list(pgargs.split(" "))
pgargs.remove('pagelist')
pgargs = list(filter(None, pgargs))
pglimit = ([x for x in pgargs if str(x).isdigit()])
pgtags = ([x for x in pgargs if not str(x).isdigit()])
for tag in pgtags:
pages_pagelist_args[pgfile].append(tag)
#https://stackoverflow.com/questions/17506552/python-os-path-relpath-behavior
def realrelpath(origin, dest):
'''Get the relative path between two paths, accounting for filepaths'''
# get the absolute paths so that strings can be compared
origin = os.path.abspath(origin)
dest = os.path.abspath(dest)
# find out if the origin and destination are filepaths
origin_isfile = os.path.isfile(origin)
dest_isfile = os.path.isfile(dest)
# if dealing with filepaths,
if origin_isfile or dest_isfile:
# get the base filename
#changed to dest (as oppsed to post in stackoverflow)
filename = os.path.basename(dest) if origin_isfile else os.path.basename(dest)
# in cases where we're dealing with a file, use only the directory name
origin = os.path.dirname(origin) if origin_isfile else origin
dest = os.path.dirname(dest) if dest_isfile else dest
# get the relative path between directories, then re-add the filename
return os.path.join(os.path.relpath(dest, origin), filename)
else:
# if not dealing with any filepaths, just run relpath as usual
return os.path.relpath(dest, origin)
# match selected tags only
for page in pages_pagelist_args:
print('Generating links on page', page, 'with pagelist arguments', pages_pagelist_args[page])
for mdfile in pages_tags:
# exclude myself
if mdfile != page:
if set(pages_pagelist_args[page]).issubset(pages_tags[mdfile]):
print('Creating link for matched file located at', mdfile)
# relative_path = os.path.relpath(mdfile, page)
relative_path = realrelpath(page, mdfile)
print('Link is', relative_path)
Thanks. This is it integrated but I don't have time to test it right now.
import re
import os
from mkdocs.plugins import BasePlugin
from urllib.parse import urlsplit
from pathlib import Path
class PageListPlugin(BasePlugin):
"""
A MkDocs plugin to generate dynamic lists of pages based on `{pagelist}` commands in markdown files.
It supports grouping by folder, filtering by tags, and limiting the number of links.
"""
def __init__(self):
self.page_list_info = []
def on_nav(self, nav, config, files):
self.nav = nav
self.files = files
for file in files:
self._gather_page_list_info(file)
def _gather_page_list_info(self, file):
try:
with open(file.abs_src_path, 'r', encoding='utf-8') as f:
content = f.read()
except UnicodeDecodeError:
try:
with open(file.abs_src_path, 'r', encoding='latin-1') as f:
content = f.read()
except Exception as e:
print(f"Error reading file {file.abs_src_path}: {e}")
return
for match in re.finditer(r'\{pagelist(?:\s+(\d+|g|i)\s*(.*?))?(?:\|\s*(.*))?\}', content):
page_list_code = match.group(0)
page_url = file.url
self.page_list_info.append({'page_url': page_url, 'page_list_code': page_list_code})
def on_post_page(self, output, page, config):
matches = re.finditer(r'\{pagelist(?:\s+(\d+|g|i)\s*(.*?))?(?:\|\s*(.*))?\}', output)
for match in matches:
if match.group(1) == 'i':
page_list_output = self.generate_page_list_info_output(self.page_list_info, page)
output = output.replace(match.group(0), page_list_output, 1)
else:
group_folders = match.group(1) == 'g'
tags_to_filter = match.group(2).strip().split() if match.group(2) else page.meta.get('tags', [])
limit = int(match.group(1)) if match.group(1) and match.group(1).isdigit() else None
folders_to_filter = match.group(3).strip().split() if match.group(3) else []
filtered_list = self._format_links_by_folder_and_tag(tags_to_filter, page, config, group_folders, limit, folders_to_filter)
output = output.replace(match.group(0), filtered_list, 1)
return output
def generate_page_list_info_output(self, page_list_info, current_page):
output = '<ol class="page-list-info">'
for info in page_list_info:
relative_path = self.realrelpath(current_page.url, info['page_url'])
output += f"<li><a href='{relative_path}'>{info['page_url']}</a> - {info['page_list_code']}</li>"
output += '</ol>'
return output
def _format_links_by_folder_and_tag(self, tags_to_filter, current_page, config, group_folders, limit, folders_to_filter):
folder_groups = {}
# Normalize the folders_to_filter list
normalized_folders_to_filter = [folder.lower() for folder in folders_to_filter]
for file in self.files:
if file.page is not None and self._page_has_tags(file.page, tags_to_filter):
folder_name = self._extract_folder_name(file.page.url).lower()
# Check if the folder name matches any of the specified folders to filter
if folders_to_filter and folder_name not in normalized_folders_to_filter:
continue # Skip this page if its folder is not in the folders_to_filter list
if folder_name not in folder_groups:
folder_groups[folder_name] = []
folder_groups[folder_name].append(file.page)
result = '<div class="pagelist">'
item_count = 0 # Initialize item count
for folder, pages in folder_groups.items():
if group_folders:
result += f'<h3 class="pagelistheading">{folder.capitalize()}</h3>\n'
result += '<ul class="pagelistlist">\n'
for page in pages:
if limit is not None and item_count >= limit:
break # Stop adding links once the limit is reached
relative_path = self.realrelpath(current_page.url, page.url)
result += f'<li><a href="{relative_path}">{page.title}</a></li>\n'
item_count += 1
result += '</ul>\n'
if limit is not None and item_count >= limit:
break # Break the outer loop as well if the limit is reached
result += '</div>'
return result
def _page_has_tags(self, page, tags_to_filter):
if not tags_to_filter:
return False # Return False if no tags to filter
page_tags = set(page.meta.get('tags', []))
any_tags = {tag for tag in tags_to_filter if not tag.startswith('+') and not tag.startswith('-')}
all_tags = {tag.lstrip('+') for tag in tags_to_filter if tag.startswith('+')}
exclude_tags = {tag.lstrip('-') for tag in tags_to_filter if tag.startswith('-')}
any_match = any(tag in page_tags for tag in any_tags) if any_tags else True
all_match = all(tag in page_tags for tag in all_tags)
exclude_match = not any(tag in page_tags for tag in exclude_tags)
return any_match and all_match and exclude_match
def _extract_folder_name(self, url):
path_parts = Path(urlsplit(url).path).parts
relevant_parts = path_parts[:-1]
folder_title = ' '.join(part.capitalize() for part in relevant_parts)
return folder_title
# Copy the realrelpath function here
def realrelpath(self, origin, dest):
'''Get the relative path between two paths, accounting for filepaths'''
# get the absolute paths so that strings can be compared
origin = os.path.abspath(origin)
dest = os.path.abspath(dest)
# find out if the origin and destination are filepaths
origin_isfile = os.path.isfile(origin)
dest_isfile = os.path.isfile(dest)
# if dealing with filepaths,
if origin_isfile or dest_isfile:
# get the base filename
filename = os.path.basename(dest) if origin_isfile else os.path.basename(dest)
# in cases where we're dealing with a file, use only the directory name
origin = os.path.dirname(origin) if origin_isfile else origin
dest = os.path.dirname(dest) if dest_isfile else dest
# get the relative path between directories, then re-add the filename
return os.path.join(os.path.relpath(dest, origin), filename)
else:
# if not dealing with any filepaths, just run relpath as usual
return os.path.relpath(dest, origin)
def on_files(self, files, config):
self.files = files
return files
Thanks! I have tested this and all of the originally described testcases now generate proper links!
one question: why is it not generating links directly to .md file which matches ? It only generates link to a parent directory, which is IMHO wrong, since you may have multiple docs in the directory. It works in my simple test env. since I have only single md. per directory and mkdocs automatically performs redirects
WARNING - [19:53:16] "GET /test1/index1 HTTP/1.1" code 302
WARNING - [19:53:19] "GET /test1/index2 HTTP/1.1" code 302
INFO - [19:53:20] Browser connected: http://localhost:8000/test1/index2/
INFO - [19:53:24] Browser connected: http://localhost:8000/test2/test2_2/index2/
WARNING - [19:55:38] "GET /test1/index2 HTTP/1.1" code 302
INFO - [19:55:40] Browser connected: http://localhost:8000/test1/index2/
e.g instead of http://localhost:8000/test1/index2/doc.md
it only generates link http://localhost:8000/test1/index2/
- but this was already the case before you implemented the fix proposed in your last post.
Hi,
I'm not sure whether this is a feature request or bug, but I have noticed that the links are generated properly only if traversing sub-directories/documents.
This is my test scenario:
All of the md docs do have
index_test
tag set. Now when I try to generate same set of links in each of them using{pagelist 100 index_test },
following's the result:WARNING - [19:22:08] "GET /test2/test2_2/index2 HTTP/1.1" code 302
So I assume that pagelist expects that the generated links/docs are in the subdirectories? IMHO it should be able to generate links for any document regardless of the documentation structure and their location. Eg. we often try to reference docs from different sections/locations - it's quite common I think.
Would this be possible to fix? Thanks!