TiddlyWiki / TiddlyWiki5

A self-contained JavaScript wiki for the browser, Node.js, AWS Lambda etc.
https://tiddlywiki.com/
Other
8.05k stars 1.19k forks source link

request: sorting by number patterns should be easy #3104

Open pmario opened 6 years ago

pmario commented 6 years ago

Sorting a list of tiddlers that contain patterns like this should be easy

proposed syntax: npsort:suffix[delimiter]

np ... numbered pattern ... sort suffix ... number of active sections to deal with eg: 1.2.a.b ... Where suffix would be :2 delimiter defaults to [.] ... examples: [.:] dot or colon; [. ] dot or space ...

pmario commented 6 years ago

npsort:sections[delimiter]

also possible:

npsort:sections<delimiter> npsort:sections{delimiter}

mwiktowy commented 6 years ago

Another possible use-case would be a "\" or "/" delimiter for sorting URLs or pseudo file paths. But that might lead to a psort and npsort to determine how to sort the stuff between delimiters ... or psortcs or npsortcs for that matter.

Pattern sorting would certainly help my regulation sorting use-case ... but the full sort depth is more complicated than your example (e.g. §111.1(a)(1)(i) ... number.number(alpha)(number)(lower-case roman numeral) ). I am not sure if this concept could be extended to be modular so that a more complex sort filter can be built up by nesting. For example:

psort:numeric[§|§§]:numeric[.]:alpha[(?)]:numeric[(?)]:roman[(?)][TextReference]

Also, it might be a good idea to synchronize the syntax with future text-slicing development in mind so that a user can slice, name and sort using one pattern. I haven't found documentation on the new json slicer custom rules yet.

pmario commented 6 years ago

@mwiktowy thx for the feedback.

§111.1(a)(1)(i) ... ouch ;) ...

Is it possible to have §111.1(a)(001)(i) .. §111.1(a)(099)(i)? ... because this would be 2 sections with the numbers and then a simple alphabetic search.

mwiktowy commented 6 years ago

It is more fun than that really. There is first a "Title" that ranges from 1-50 and Chapter (Upper case Roman numerals) and Subchapter (Upper-case Letter). I ignore all that since normally an office is acting under one title/chapter/subchapter.

There are likely more extensive regulations than what I deal with but the 111. part typically stays under 1000 and it not zero padded. The .1 part can range up to a few thousand but there are typically several Appendices to the 111. part named A111., B111., C111., ... etc. but I just fake the sorting of those using extra .1 subsections counting up by hundreds after the .1 are finished ... so if if the final is 111.123, I will number the Appendix A of 111, 111.200.

Typically, the (1) part does not get very large ... I don't recall it getting higher than 10 and does not get padded with leading zeros. However, I was considering using the nifty new formula plugin to make a sortable number (= section + subsection/1000 =)

On top of that, if you look at the (a) part ... when you get to (z), it starts again at (aa), (ab), (ac) ... luckily none of the parts I deal with go up that high but some do reach (i) ... which make things fun when paragraph (i)-letter has a sub-sub-paragraph (i)-roman-numeral.

And I forgot another layer ... there is an upper case letter after all of this.

So the complete definition is "Title" ## "Chapter" R "Subchapter" A "Part" ### ["SubPart" A §|A].####(aa)(#)(r)(A) where: # is a number A is an upper-case letter a is a lower-case letter R is an upper-case Roman Numeral r is a lower-case roman numeral

The Part and SubPart in the middle can be largely skipped over since each section, §, or Appendix, A, reiterates the part and the subsection #### is sequential throughout all the subparts. See the eCFRs for examples ... this is a pretty deep example: https://www.ecfr.gov/cgi-bin/text-idx?node=se14.4.417_1111&rgn=div8 ... this is as far down as a parse to avoid nested list nightmares ... all of this is in one tiddler.

So breaking this down in a tag-linked hierarchy with meaningful tiddler titles and sane sorting so that you can transclude them all back together through a template at any level has been ... challenging.

pmario commented 6 years ago

So breaking this down in a tag-linked hierarchy with meaningful tiddler titles and sane sorting so that you can transclude them all back together through a template at any level has been ... challenging.

Holy smokes! ... Yea, that is challenging. ... But If we could do propper sorting with that system, we should nail many other systems too :)

pmario commented 6 years ago

Is there an official specification of the numbering scheme? ...

pmario commented 6 years ago

Also, it might be a good idea to synchronize the syntax with future text-slicing development in mind so that a user can slice, name and sort using one pattern. I haven't found documentation on the new json slicer custom rules yet.

Do you need to import text into your TW, or do you just need the numbering scheme for proper sorting of your remarks?

Do you use text-slicer with your content?

pmario commented 6 years ago

So to see, if I do understand it:

If I would be pointed to: Title 14 → Chapter III → Subchapter C → Part 417 → Subpart B

I would find:

Title 14: Aeronautics and Space PART 417—LAUNCH SAFETY Subpart B—Launch Safety Responsibilities

Where PART is same as §417 ... There is no other §417 ... right?

pmario commented 6 years ago

§417.111(h)(1)(iii)(A) would lead me to the text section: "Date and time of occurrence." ... right?

So my question is, how do you say: Go to: §417.111(h)(1)(iii)(A) in plain english

I'll start with: "paragraph 417 dot 111" ????

mwiktowy commented 6 years ago

The most definitive source I found is here: https://www.archives.gov/files/federal-register/tutorial/tutorial_060.pdf

14 CFR 431 with CFR Tools.zip

Attached is one of the regs (in exported JSON format) that I parsed with some templates I use to group and display them.

I just cut and pasted them from the eCFR into an empty tiddler and then manually parsed it since I wanted some more control over the resulting tiddler name than the slicer tool gave. Plus I chopped it up before the slicer and the super-useful "Excise" tool existed.

There is some inconsistency in the "uniform naming system" in that if a section only has one paragraph, sometimes they won't label it (but that varies part to part). Also, there is sometimes some preamble before starting the paragraph labels.

pmario commented 6 years ago

just a reminder

Paragraph Levels
Sections may contain up to 6 levels of paragraphs.

We strongly recommend agencies use no more than 3 levels.

Paragraph    Designations         Cite paragraph as
Level 1   (a), (b), (c), etc.     § 303.1(a)
Level 2   (1), (2), (3), etc.     § 303.1(a)(1)
Level 3   (i), (ii), (iii), etc.  § 303.1(a)(1)(i)
Level 4   (A), (B), (C), etc.     § 303.1(a)(1)(i)(A)
Level 5   (1), (2), (3), etc.     § 303.1(a)(1)(i)(A)(1)
Level 6   (i), (ii), (iii), etc.  § 303.1(a)(1)(i)(A)(1)(i)
joshuafontany commented 5 years ago

You might want to have a go at running it through my "alphanumeric tokenized sort" filter, which I included to sort tw-lists of long/json/path/indexes/with/numerals/0/523/etc.

The tokenization javascript I'm borrowing is really well thought out, so it might handle most of what you want. It is included in my JsonMangler plugin:

https://joshuafontany.github.io/TW5-JsonManglerPlugin/

Once imported, you can use tsort[] in any filter chain (tsort[true] for case-sensitive sorting). Here is the backing lib: https://github.com/joshuafontany/TW5-JsonManglerPlugin/blob/master/modules/libs/alphanum.js

And the tw-filter wrapper: https://github.com/joshuafontany/TW5-JsonManglerPlugin/blob/master/modules/filters/tsort.js