delph-in / docs

DELPH-IN Documentation
https://delph-in.github.io/docs/
28 stars 3 forks source link

Page not imported? #18

Open emilymbender opened 3 years ago

emilymbender commented 3 years ago

It looks like this page didn't get imported: http://moin.delph-in.net/wiki/CambridgeSEM-I

It's world readable, so I wonder if the problem is that the page name is a bit odd (has a hyphen) and if so, if there might be other pages that weren't imported.

@arademaker can you import it and also see if maybe there are others?

It also looks like links to the page will need to be updated. I discovered it was missing by looking here:

https://github.com/delph-in/docs/wiki/RmrsDiscussions

arademaker commented 3 years ago

I have manually copied and converted to markdown the page https://github.com/delph-in/docs/wiki/CambridgeSEM-I. You are right, my code should have missed that page because of the character that looks like an hyphen in the name or because in MoinMoin we had two very similar pages:

CambridgeSEM\(28\)2d\(29\)I/  CambridgeSEM\(2d\)I/

The first was deleted, so maybe I could have made something wrong! thank you for open the issue. I fixed the links in https://github.com/delph-in/docs/wiki/CambridgeSchedule and https://github.com/delph-in/docs/wiki/RmrsDiscussions (needs some edition to improve format).

and also see if maybe there are others?

Not sure at this stage how I can check that. In the dump from MoinMoin I have 1266 pages:

% ls | wc -l
    1266

In the current wiki we have 1057 pages:

% ls | wc -l
    1057

But many MoinMoin pages were intentionally removed:

  1. personal pages
  2. system pages
  3. restricted pages that @oepen listed to me

We do have some weird names in the new wiki, but the content looks right:

% ls | rg "[^a-zA-Z0-9_.]"
CambridgeSEM-I.md
LtgOslo_Hank(c3b8).md
LtgOslo_Hank(c3b8)Retreat.md
MatrixDoc_Nominalized(20)Clauses.md
Saabr(c3bc)ckenTop.md
Saarbr(c3bc)ckenTop.md
SideSt(c3b8)rrelse.md
Singapore(20)Top.md
TheAbbey_Chrysalis2014PpAttachment(5d).md
ToolsTop_converter(2e)html.md
Usability_ease(20)of(20)set(2d)up.md

In the MoinMoin dump we have

% ls | rg "[^a-zA-Z0-9_.]"
(28)c396(29)nskadeSidor
(28)c396(29)vergivnaSidor
(28)c398(29)nskedeSider
(c396)nskadeSidor
(c396)vergivnaSidor
(c398)nskedeSider
4(28)2d(29)16_Meeting_Notes
4(2d)16_Meeting_Notes
Aktuelle(28)c384(29)nderungen
Aktuelle(c384)nderungen
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar
Anv(c3a4)ndarInst(c3a4)llningar
CambridgeSEM(28)2d(29)I
CambridgeSEM(2d)I
ChangementsR(28)c3a9(29)cents
ChangementsR(c3a9)cents
ClarinoTop(2f)RelatedWork
ClarinoTop(2f)RequirementsSurvey
ClarinoTop(2f)TechnologySurvey
Climb(2f)GClimb
Climb(2f)GClimb(2f)German
DeepBank(2f)OneOne
DeepBank(2f)OneZero
DelphinTutorial(2f)Distributions
DelphinTutorial(2f)Formalisms
DelphinTutorial(2f)Grammars
DelphinTutorial(2f)Processing
ErgProcessing(2f)ExportExample
ErgProcessing(2f)SampleExport
ErgSemantics(2f)Apposition
ErgSemantics(2f)Basics
ErgSemantics(2f)Ccs
ErgSemantics(2f)Comparatives
ErgSemantics(2f)Compounding
ErgSemantics(2f)Conditionals
ErgSemantics(2f)ControlRelations
ErgSemantics(2f)Conventions
ErgSemantics(2f)Coordination
ErgSemantics(2f)Design
ErgSemantics(2f)Discovery
ErgSemantics(2f)Ellipsis
ErgSemantics(2f)Essence
ErgSemantics(2f)ForeignExpressions
ErgSemantics(2f)Fragments
ErgSemantics(2f)Fundamentals
ErgSemantics(2f)HowToCite
ErgSemantics(2f)IdentityCopulae
ErgSemantics(2f)Imperatives
ErgSemantics(2f)ImplicitLocatives
ErgSemantics(2f)ImplicitNominals
ErgSemantics(2f)ImplicitQuantifiers
ErgSemantics(2f)InstrumentalRelatives
ErgSemantics(2f)Interface
ErgSemantics(2f)Internals
ErgSemantics(2f)Inventory
ErgSemantics(2f)MeasurePhrases
ErgSemantics(2f)Nominalization
ErgSemantics(2f)NonAdverbialClausalModifiers
ErgSemantics(2f)NonScopalModifiers
ErgSemantics(2f)Notes
ErgSemantics(2f)NumberSequences
ErgSemantics(2f)Parentheticals
ErgSemantics(2f)Partitives
ErgSemantics(2f)Predicates
ErgSemantics(2f)PropositionalArguments
ErgSemantics(2f)Quantification
ErgSemantics(2f)QuasiModalInfinitivals
ErgSemantics(2f)RelationalNouns
ErgSemantics(2f)RunOnConstruction
ErgSemantics(2f)Template
ErgSemantics(2f)Terminology
ErgSemantics(2f)TimeExpressions
ErgSemantics(2f)ToDo
ErgSemantics(2f)Vocatives
ErgTokenization(2f)ComplexExample
EventStats(2f)HitCounts
EventStats(2f)UserAgents
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor
FeforPlenum(2f)Formalism
FeforPlenum(2f)LexicalAcquisitionImmatureGrammars
For(28)c3a6(29)ldrel(28)c3b8(29)seSider
For(c3a6)ldrel(c3b8)seSider
ItsdbTreebanking(2f)ItsdbAnnotation
ItsdbTreebanking(2f)ItsdbExporting
ItsdbTreebanking(2f)ItsdbModeling
ItsdbTreebanking(2f)ItsdbTrouble
ItsdbTreebanking(2f)ItsdbUpdating
ItsdbTsdb(2f)ProcessingRelations
JimWhite(2f)StarSemTokenTabulation
KyotoSchedule(2f)InterDelphinNotes
KyotoTop(2f)InterWiki
LapDevelopment(2f)Abel
LapDevelopment(2f)Accounting
LapDevelopment(2f)Annotations
LapDevelopment(2f)Blog
LapDevelopment(2f)DKPROCompilation
LapDevelopment(2f)Deployment
LapDevelopment(2f)DkPro
LapDevelopment(2f)Environment
LapDevelopment(2f)Giellatekno
LapDevelopment(2f)Hackathons
LapDevelopment(2f)Internals
LapDevelopment(2f)Library
LapDevelopment(2f)MongoDB
LapDevelopment(2f)Production
LapDevelopment(2f)Schedule
LapDevelopment(2f)ServerDeployment
LapDevelopment(2f)SeverDeployment
LapDevelopment(2f)Status
LapDevelopment(2f)Tasks
LapDevelopment(2f)Tests
LapDevelopment(2f)ToE
LapDevelopment(2f)Tree
LogonInstallation(2f)CvsBasics
LogonInstallation(2f)InstallationBasics
LogonMrs(2f)InformationStructure
LogonMrs(2f)MessageRelations
LogonProcessing(2f)BatchGeneration
LogonProcessing(2f)BatchParsing
LogonProcessing(2f)BatchTranslation
LogonTest(2f)BenchmarkingSuite
LtgOslo(2f)BibTeX
LtgOslo(2f)Cristin
LtgOslo(2f)Delphin
LtgOslo(2f)EndreAalrust
LtgOslo(2f)Goals
LtgOslo(2f)Hank(28)c3b8(29)
LtgOslo(2f)Hank(c3b8)
LtgOslo(2f)LaTeX
LtgOslo(2f)Linux
LtgOslo(2f)MSc
LtgOslo(2f)MajaBuljan
LtgOslo(2f)MarteSvalastoga
LtgOslo(2f)Norsk
LtgOslo(2f)Oscarsborg
LtgOslo(2f)TechTalks
LtgOslo(2f)TechTalks16
LtgOslo(2f)TechTalksH2016
LtgOslo(2f)WorkDuties
MatrixDoc(2f)AdnominalPossession
MatrixDoc(2f)ArgumentOptionality
MatrixDoc(2f)Case
MatrixDoc(2f)ClausalComplements
MatrixDoc(2f)ClausalModifiers
MatrixDoc(2f)Coordination
MatrixDoc(2f)DirectInverse
MatrixDoc(2f)Evidentials
MatrixDoc(2f)Gender
MatrixDoc(2f)GeneralInfo
MatrixDoc(2f)ImportToolboxLexicon
MatrixDoc(2f)InformationStructure
MatrixDoc(2f)Lexicon
MatrixDoc(2f)Morphology
MatrixDoc(2f)NominalizedClauses
MatrixDoc(2f)Number
MatrixDoc(2f)OtherFeatures
MatrixDoc(2f)Person
MatrixDoc(2f)SententialNegation
MatrixDoc(2f)TenseAspectMood
MatrixDoc(2f)TestByGeneration
MatrixDoc(2f)TestSentences
MatrixDoc(2f)WhQ
MatrixDoc(2f)WordOrder
MatrixDoc(2f)YesNoQ
MoinMoin(2f)InstallDocs
MoinMoin(2f)InstallationsAnleitung
MoinMoin(2f)TextFormatting
MtJaen(2f)MtJaenTanaka
OpenissuesTop(2f)GrammarMatrixClitic
OpenissuesTop(2f)GrammarMatrixSerialVerbConstructions
OpenissuesTop(2f)GrammarMatrixTenseAspect
PageAl(28)c3a9(29)atoire
PageAl(c3a9)atoire
PagesAbandonn(28)c3a9(29)es
PagesAbandonn(c3a9)es
PagesSouhait(28)c3a9(29)es
PagesSouhait(c3a9)es
PhonologyTop(2f)FrenchPhonemes
PhonologyTop(2f)InterWiki
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur
Pr(c3a9)f(c3a9)rencesUtilisateur
S(28)c3b6(29)kSida
S(c3b6)kSida
SeitenGr(28)c3b6c39f(29)e
SeitenGr(c3b6c39f)e
Senaste(28)c384(29)ndringar
Senaste(c384)ndringar
SideSt(28)c3b8(29)rrelse
SideSt(c3b8)rrelse
Singapore(20)Top
Singapore(28)20(29)Top
SynSem(2f)Activities
SynSem(2f)Activities(2f)AnnotationConsistency
SynSem(2f)Activities(2f)ControlRaising
SynSem(2f)Activities(2f)Coordination
SynSem(2f)Activities(2f)DependentDimensions
SynSem(2f)Activities(2f)ExtrinsicParserEvaluation
SynSem(2f)Activities(2f)Gapping
SynSem(2f)Activities(2f)GramRel
SynSem(2f)Activities(2f)IdentitySyntax
SynSem(2f)Activities(2f)PcdrtEllipsis
SynSem(2f)Activities(2f)PcdrtEllipsis(2f)10Oct2017
SynSem(2f)Activities(2f)PcdrtEllipsis(2f)25Sept2017
SynSem(2f)Activities(2f)PolymorphicVariadicPredicates
SynSem(2f)Activities(2f)UdMeaningConstruction
SynSem(2f)Candidates
SynSem(2f)Impressions
SynSem(2f)Launch
SynSem(2f)LysebuResources
SynSem(2f)MeaningConstruction
SynSem(2f)MeaningRepresentation
SynSem(2f)Planning
SynSem(2f)Problems
SynSem(2f)Problems(2f)ERGQuantification
SynSem(2f)Problems(2f)ScopalNonScopal
SynSem(2f)Problems(2f)UDDeterminers
TheAbbey(2f)Chrysalis2014
TheAbbey(2f)Chrysalis2014Arity
TheAbbey(2f)Chrysalis2014BindingTheory
TheAbbey(2f)Chrysalis2014DeverbalNouns
TheAbbey(2f)Chrysalis2014Nominalization
TheAbbey(2f)Chrysalis2014OpenEndedPredicates
TheAbbey(2f)Chrysalis2014PossessiveIdioms
TheAbbey(2f)Chrysalis2014PpAttachment
TheAbbey(2f)Chrysalis2014ProperNouns
TheAbbey(2f)Chrysalis2014ProperNounsGeneration
TheAbbey(2f)Chrysalis2014SchrodingerMrs
TheAbbey(2f)Chrysalis2014Terminology
TheAbbey(2f)Chrysalis2014WhatsThePoint
Tilf(28)c3a6(29)ldigSide
Tilf(c3a6)ldigSide
ToolsTop(2f)converter(28)2e(29)html
ToolsTop(2f)converter(2e)html
Tu(28)e1baa5(29)nAnhL(28)c3aa(29)
Tu(e1baa5)nAnhL(c3aa)
TuanAnhLe(2f)GramEng4Dummies
WeSearch(2f)Adaptation
WeSearch(2f)Adaptation(2f)Background
WeSearch(2f)AnalysisCatalog
WeSearch(2f)Berlin
WeSearch(2f)Ccs
WeSearch(2f)CcsDayOne
WeSearch(2f)CcsDayThree
WeSearch(2f)CcsDayTwo
WeSearch(2f)ChartPruning
WeSearch(2f)DataCollection
WeSearch(2f)Demonstrator
WeSearch(2f)DescriptiveStatistics
WeSearch(2f)DesignPrinciples
WeSearch(2f)DocumentParsing
WeSearch(2f)FeforTopics
WeSearch(2f)Hank(28)c3b8(29)Schedule
WeSearch(2f)Hank(28)c3b8(29)TheRest
WeSearch(2f)Hank(c3b8)Schedule
WeSearch(2f)Hank(c3b8)TheRest
WeSearch(2f)ICONS
WeSearch(2f)Interface
WeSearch(2f)LexicalFiltering
WeSearch(2f)ParserAdaptation
WeSearch(2f)ParserEvaluation
WeSearch(2f)PestExamples
WeSearch(2f)QueryLanguage
WeSearch(2f)Rdf
WeSearch(2f)ReadingGroup
WeSearch(2f)RealisticTextParsing
WeSearch(2f)Resa
WeSearch(2f)ScopalArgCoord
WeSearch(2f)SentenceSegmentation
WeSearch(2f)StarSem
WeSearch(2f)StarSem(2f)MrsCrawling
WeSearch(2f)StarSem(2f)MrsCrawlingEvaluation
WeSearch(2f)StarSem(2f)MrsCrawlingOracle
WeSearch(2f)StarSem(2f)MrsReadingGroup
WeSearch(2f)StarSem(2f)UiO
WeSearch(2f)SuperTagging
WeSearch(2f)SuperTagging(2f)Setup
WeSearch(2f)Tokenization
WeSearch(2f)TripleStores
WeSearch(2f)UberTagging
WeSearch(2f)UnderspecifedAttachment
WeSearch(2f)UnderspecifiedPreds
WeSearch(2f)VariablePropertySharing
WikiSandL(28)c3a5(29)da
WikiSandL(c3a5)da
https(3a2f2f)students(2e)washington(2e)edu(2f)olzama(2f)ge
venue(28)2d(29)map(28)2e(29)png
venue(2d)map(2e)png

But some are garbage in MoinMoin, see the last two. The content is an image. Many pages were correctly imported by renamed from MatrixDoc(2f)Lexicon to https://github.com/delph-in/docs/wiki/MatrixDoc_Lexicon (because MoinMoin had support to subpages). Many pages under WeSearch prefixed were protected and not imported.

One more case similar to the CambridgeSEM-I page:

ToolsTop(2f)converter(28)2e(29)html
ToolsTop(2f)converter(2e)html

I have just manually create https://github.com/delph-in/docs/wiki/ToolsTop_converter.

arademaker commented 3 years ago

Pages

LtgOslo(2f)Hank(28)c3b8(29)
LtgOslo(2f)Hank(c3b8)

the first was deleted, the second is protected in MoinMoin. So I removed them from here:

0e28907d5461412626da14ea103680c31f7ea951 (HEAD -> master, origin/master) Destroyed LtgOslo_Hank(c3b8)Retreat (markdown)
:100644 000000 4df770ab 00000000 D      LtgOslo_Hank(c3b8)Retreat.md
63049d2ec3ed739c829debb7623cb210ca533027 Destroyed LtgOslo_Hank(c3b8) (markdown)
:100644 000000 4df770ab 00000000 D      LtgOslo_Hank(c3b8).md
arademaker commented 3 years ago

Help needed! Can someone see any important page in the lists above that is not in the current wiki?

arademaker commented 3 years ago

Pages

Saabr(c3bc)ckenTop.md
Saarbr(c3bc)ckenTop.md

were duplicated (related to #25), I fixed the name and merged the contents in https://github.com/delph-in/docs/wiki/SaabruckenTop.

goodmami commented 3 years ago

I think all those (2f) kinds of things are when whatever converter you used tried to escape the punctuation. They are hexadecimal values for ASCII characters (illustrated in Python (sorry) below):

>>> chr(int('2d', 16))  # convert base-16 int to character
'-'
>>> chr(int('2f', 16))
'/'

Although the one for SaarbrückenTop is strange:

>>> chr(int('c3bc', 16))
'쎼'
>>> hex(ord('ü'))  # going the other way
'0xfc'

Then the CambridgeSEM\(28\)2d\(29\)I/ vs CambridgeSEM\(2d\)I/ thing is because those escapes were, themselves, escaped:

>>> chr(int('28', 16))
'('
>>> chr(int('29', 16))
')'

It looks like all the ones with only (2f) (/) used _ instead and are imported already. The ones with dashes (CambridgeSEM-I) are presented in the browser with the dash as a space (see here). With this in mind I whittled down your list a bit. I don't have the Moin dump so I copied your file list above as moin.txt, then I created two normalized lists of files like this:

$ cat moin.txt | sed -e 's/(2f)/_/g' -e 's/(2d)/-/g' -e 's/$/.md/' > moin-norm.txt
$ ls | grep "[^a-zA-Z0-9.]" | sort > current.txt

Then I can find which ones are not already ported:

$ comm -2 -3 moin-norm.txt current.txt  # find lines in common, only show unique in moin-norm.txt

It produces the following list, which I have manually sorted and annotated:

# System pages (I'm just guessing for the non-English titles)
(28)c396(29)nskadeSidor.md
(28)c396(29)vergivnaSidor.md
(28)c398(29)nskedeSider.md
Aktuelle(28)c384(29)nderungen.md
Aktuelle(c384)nderungen.md
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar.md
Anv(c3a4)ndarInst(c3a4)llningar.md
(c396)nskadeSidor.md
(c396)vergivnaSidor.md
(c398)nskedeSider.md
ChangementsR(28)c3a9(29)cents.md
ChangementsR(c3a9)cents.md
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor.md
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor.md
For(28)c3a6(29)ldrel(28)c3b8(29)seSider.md
For(c3a6)ldrel(c3b8)seSider.md
MoinMoin_InstallationsAnleitung.md
MoinMoin_InstallDocs.md
MoinMoin_TextFormatting.md
PageAl(28)c3a9(29)atoire.md
PageAl(c3a9)atoire.md
PagesAbandonn(28)c3a9(29)es.md
PagesAbandonn(c3a9)es.md
PagesSouhait(28)c3a9(29)es.md
PagesSouhait(c3a9)es.md
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur.md
Pr(c3a9)f(c3a9)rencesUtilisateur.md
S(28)c3b6(29)kSida.md
S(c3b6)kSida.md
SeitenGr(28)c3b6c39f(29)e.md
SeitenGr(c3b6c39f)e.md
Senaste(28)c384(29)ndringar.md
Senaste(c384)ndringar.md
SideSt(28)c3b8(29)rrelse.md
Tilf(28)c3a6(29)ldigSide.md
Tilf(c3a6)ldigSide.md
WikiSandL(28)c3a5(29)da.md
WikiSandL(c3a5)da.md

# Personal pages or accidental (?) pages
https(3a2f2f)students(2e)washington(2e)edu_olzama_ge.md
LtgOslo_Cristin.md
Tu(28)e1baa5(29)nAnhL(28)c3aa(29).md
Tu(e1baa5)nAnhL(c3aa).md
venue(28)2d(29)map(28)2e(29)png.md
venue-map(2e)png.md
Singapore(28)20(29)Top.md  # see SingaporeTop

# Other duplicates from bad escaping
4(28)2d(29)16_Meeting_Notes.md
CambridgeSEM(28)2d(29)I.md
LtgOslo_Hank(28)c3b8(29).md
ToolsTop_converter(28)2e(29)html.md
WeSearch_Hank(28)c3b8(29)Schedule.md
WeSearch_Hank(28)c3b8(29)TheRest.md

# Potentially good pages; some already converted
4-16_Meeting_Notes.md
ClarinoTop_RelatedWork.md
ClarinoTop_RequirementsSurvey.md
ClarinoTop_TechnologySurvey.md
ErgProcessing_ExportExample.md
ErgSemantics_Fundamentals.md
ErgSemantics_NonScopalModifiers.md
ErgSemantics_RunOnConstruction.md
ItsdbTreebanking_ItsdbTrouble.md
KyotoTop_InterWiki.md
LapDevelopment_Abel.md
LapDevelopment_SeverDeployment.md
LapDevelopment_Tasks.md
LogonInstallation_CvsBasics.md
LogonInstallation_InstallationBasics.md
LogonMrs_InformationStructure.md
LogonMrs_MessageRelations.md
LtgOslo_Hank(c3b8).md  # LtgOslo/Hankø
MatrixDoc_WhQ.md
ToolsTop_converter(2e)html.md  # wiki actually had ".html" in the title; already imported as ToolsTop_converter
WeSearch_Berlin.md
WeSearch_Demonstrator.md
WeSearch_FeforTopics.md
WeSearch_Hank(c3b8)Schedule.md  # WeSearch/HankøSchedule
WeSearch_Hank(c3b8)TheRest.md  # WeSearch/HankøTheRest
WeSearch_Interface.md
WeSearch_PestExamples.md
WeSearch_RealisticTextParsing.md
WeSearch_StarSem_MrsCrawling.md
WeSearch_SuperTagging.md
WeSearch_Tokenization.md
WeSearch_TripleStores.md
WeSearch_UberTagging.md
arademaker commented 3 years ago

Thank you @goodmami , yes / were converted to _ and - GitHub magically translates to space. The parenthesis are agly but do not cause any harm. But the reason for duplications (see #25) is still not clear to me. Some duplications are already in the dump, so not an error in the migration. The encoding may have caused some error in the migration but we now have a list. I am attaching the list of all pages in the dump that I got from @oepen:

moin.txt

As you noticed, many of the cases above I already fixed.

arademaker commented 3 years ago

(edited)

The case of ErgSemantics_NonScopalModifiers.md is interesting. It looks like an important page that we may have lost, but http://moin.delph-in.net/wiki/ErgSemantics/NonScopalModifiers. This page was deleted in MoinMoin. Actually, it was ErgSemantics(2f)RelativeClauses renamed to ErgSemantics(2f)NonScopalModifiers and later deleted:

See ErgSemantics\(2f\)NonScopalModifiers/edit-log:

1382547133360546    00000001    SAVENEW ErgSemantics(2f)RelativeClauses 75.146.63.242   75-146-63-242-Washington.hfc.comcastbusiness.net    1101511421.47.55017     
1382549060521974    00000002    SAVE    ErgSemantics(2f)RelativeClauses 75.146.63.242   75-146-63-242-Washington.hfc.comcastbusiness.net    1101511421.47.55017     
1382549150509747    00000003    SAVE    ErgSemantics(2f)RelativeClauses 75.146.63.242   75-146-63-242-Washington.hfc.comcastbusiness.net    1101511421.47.55017     
1382602650699190    00000004    SAVE    ErgSemantics(2f)RelativeClauses 93.206.0.159    p5DCE009F.dip0.t-ipconnect.de   1098876287.95.17133     
1405018957939738    00000005    SAVE    ErgSemantics(2f)RelativeClauses 87.162.226.112  p57A2E270.dip0.t-ipconnect.de   1098876287.95.17133     
1415232274472863    00000006    SAVE    ErgSemantics(2f)RelativeClauses 75.146.63.242   75-146-63-242-Washington.hfc.comcastbusiness.net    1101511421.47.55017     
1433437370469623    00000007    SAVE    ErgSemantics(2f)RelativeClauses 75.146.63.242   75-146-63-242-Washington.hfc.comcastbusiness.net    1101511421.47.55017     
1450478308547806    00000008    SAVE    ErgSemantics(2f)RelativeClauses 174.21.159.201  174-21-159-201.tukw.qwest.net   1101511421.47.55017     A first attempt at talking about intersective modification as a `phenomenon'
1450478469201052    00000009    SAVE    ErgSemantics(2f)RelativeClauses 174.21.159.201  174-21-159-201.tukw.qwest.net   1101511421.47.55017     Noting references I haven't yet looked through
1450733435563410    00000010    SAVE/RENAME ErgSemantics(2f)NonScopalModifiers  193.157.186.127 1x-193-157-186-127.uio.no   1098876287.95.17133 ErgSemantics/RelativeClauses    per ESD decision
1450734224521613    00000011    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.159.201  174-21-159-201.tukw.qwest.net   1101511421.47.55017     
1453307751475650    00000012    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.160.48   174-21-160-48.tukw.qwest.net    1101511421.47.55017     Typographic conventions
1453307888518481    00000013    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.160.48   174-21-160-48.tukw.qwest.net    1101511421.47.55017     
1453308154249605    00000014    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.160.48   174-21-160-48.tukw.qwest.net    1101511421.47.55017     Revise to fully embrace ‘non-scopal modifiers’ as the phenomenon name
1453308395419146    00000015    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.160.48   174-21-160-48.tukw.qwest.net    1101511421.47.55017     Further edits based on notes from last ESD meeting
1453308624544911    00000016    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.160.48   174-21-160-48.tukw.qwest.net    1101511421.47.55017     And one last thing
1453318301298743    00000017    SAVE    ErgSemantics(2f)NonScopalModifiers  174.21.160.48   174-21-160-48.tukw.qwest.net    1101511421.47.55017     
1453822209279284    00000018    SAVE    ErgSemantics(2f)NonScopalModifiers  193.157.184.226 1x-193-157-184-226.uio.no   1098876287.95.17133     per request by emily
emilymbender commented 3 years ago

That last comment looks like one by @oepen and at a guess we decided to delete the page/merge the content elsewhere.

arademaker commented 3 years ago

One more crazy page is 4-16_Meeting_Notes.md, In the original dump I have 4\(2d\)16_Meeting_Notes/ but during the migration, I had to instantiate a local MoinMoin in a docker that was the endpoint for another script to get the contents and produce the markdown files for this new wiki. It looks like this new instance created the empty 4\(28\)2d\(29\)16_Meeting_Notes/ file.

This new file is not a big problem, it is empty and even if it generate an empty page here, we can easily delete. The original page 4\(2d\)16_Meeting_Notes/ was deleted in MoinMoin: the current version is 00000004 but the least revision with content is 00000003. But the log says nothing

1177188397000000    00000001    SAVENEW 4(2d)16_Meeting_Notes   71.35.116.39    71-35-116-39.tukw.qwest.net 1176767071.26.36927     
1177188634000000    00000002    SAVE    4(2d)16_Meeting_Notes   71.35.116.39    71-35-116-39.tukw.qwest.net 1176767071.26.36927     
1177188730000000    00000003    SAVE    4(2d)16_Meeting_Notes   71.35.116.39    71-35-116-39.tukw.qwest.net 1176767071.26.36927     
1281376152000000    00000004    SAVE    4(2d)16_Meeting_Notes   84.208.94.211   cm-84.208.94.211.getinternet.no 1098876287.95.17133     

So for me, nothing wrong here, the page does not exist in http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=fullsearch&context=180&value=notes&titlesearch=Titles, one extra clue that it was deleted. Content of the rev 0000003 looks like a draft anyway:

== slot/proto-morpheme whatchamacallit ==

'''Content'''
   * morphosyntactic categories
      * portmanteau
      * range of values
      * unhandled "dummy"

'''Context'''
   * order
   * dependencies
      * category missing (e.g. don't mark person on infinitives)
      * dependent choices (e.g. neg gets different mood)
   * optionality
      * easy
      * multipaths
   * iterability
      * *fix

But them in the current wiki I found https://github.com/delph-in/docs/wiki/notes, the name is not very informative and it looks duplicated from https://github.com/delph-in/docs/wiki/OsloScopalNonScopal. But they are not identical. So I found in the git whatchanged:

887179315996ba05848a18c2b35506eee8c4f61b Rough notes, speakers are encouraged to read & edit.
:000000 100644 00000000 dc378ff3 A      notes.md

and this same message in the history of the http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=info. So notes was the initial name for OsloScopalNonScopal. It looks like the migration process had trouble with pages that were renamed during the process to retrieve the page history. As we can see in the git log

ar@tenis docs.wiki % git log --format='%H %an %s' -- notes.md
887179315996ba05848a18c2b35506eee8c4f61b EmilyBender Rough notes, speakers are encouraged to read & edit.

ar@tenis docs.wiki % git log --format='%H %an %s' -- OsloScopalNonScopal.md
c4abe4a1952ce48117e77d6a5b8dfadc2ca02f96 EmilyBender Adding notes from SIG
51056273c26c3393d306220ced63070810ee8b7e GlennSlayden Add/Update OsloScopalNonScopal.md
683a77b7585d8b4cf6e3917100b4dcc1d4d796d6 StephanOepen Add/Update OsloScopalNonScopal.md
ef4ab6cb5451ab4fbf23bc0b9fed6f9599c23241 StephanOepen per request by emily

During the process, to preserve the history of the changes, the migration process created the notes.md. But this file was renamed and instead of delete the notes.md, the migration just create a new file with the new name. I have delete the notes.md now.

arademaker commented 3 years ago

I realize that it was a mistake from my side to not detected all these details during the migration. I am sorry for that. But no content was lost, I do have the dump, we do have MoinMoin in ready-only mode running. I still believe that for the majority of the pages, the final result is fine. So maybe we just need to be aware of those problems and try to solve the issues as we find them?

emilymbender commented 3 years ago

The migration is such a huge job, @arademaker ! Thank you for taking it on.

I think that notes.md file was indeed spurious, and I see that OsloScopalNotScopal has survived the transition. It's too bad that the 'delete' actions aren't apparent (at least as far as I can tell) in the migrated data.

arademaker commented 3 years ago

The deletion of notes.md was done by me now, locally:

% pwd
/Users/ar/hpsg/documentation/docs.wiki
% git whatchanged
8a29e9c8da582a0f71793895e11b0b2eaafaf545 (HEAD -> master, origin/master) deleted file that was renamed. See #18
:100644 000000 dc378ff3 00000000 D      notes.md
...

The good news is that we do have a way to know all pages in MoinMoin that we renamed:

find . -name edit-log | xargs awk '$3 ~ /RENAME/ {print FILENAME,$2,"new: " $4,"old: " $8}'
arademaker commented 3 years ago

For

./SynSem(2f)Activities(2f)DependentDimensions/edit-log 00000010 new: SynSem(2f)Activities(2f)DependentDimensions old: SynSem/DependentDimensions

image

I just deleted the second one in the screenshot above. The old one that was renamed.

emilymbender commented 3 years ago

Is it possible to tell which pages were deleted during the MoinMoin days, though?

arademaker commented 3 years ago

Hum, yes. For pages that are actually deleted, MoinMoin represents deletion by increasing the version number without creating an actually revision in the proper subdirectory. Each page is represented as:

% tree MatrixDocTop
MatrixDocTop
├── cache
│   └── pagelinks
├── current
├── edit-log
└── revisions
    ├── 00000001
    ├── 00000002
    ├── 00000003
    ├── 00000004
    ├── 00000005
    ├── 00000006
    ├── 00000007
    ├── 00000008
     ....

ar@tenis pages % cat MatrixTop/current
00000042

So if a page is deleted, the content of the current file will be a number that does not correspond to any file in the revisions subfolder. See http://moinmo.in/HelpOnPageDeletion

So the list of pages DELETED in MoinMoin are below. The renamed ones are not here:

venue(2d)map(2e)png
SuquamishCommunityHouse
StandingTop
StandingGroup
ShortCLIMB
PgAccess
PetEvolution
PestTop
ParisCards
ParallelCorp
MWEs_and_Idiomatic_Expressions
LogonMrs(2f)MessageRelations
LogonMrs(2f)InformationStructure
LogonInstallation(2f)InstallationBasics
LogonInstallation(2f)CvsBasics
LkbSmaf
LkbLexDbPsqlInitialize
LkbLexDbInitialize
LkbDownload
LicensingChoices
LexDbPgAccess
LexDB_Internals
LapDevelopment(2f)Tasks
LapDevelopment(2f)SeverDeployment
LapDevelopment(2f)Abel
KyotoFutureSummitSuggestions
ItsdbTreebanking(2f)ItsdbTrouble
Initialize_LexDB
ErgSemanticsTemplate
ErgSemantics(2f)RunOnConstruction
ErgSemantics(2f)NonScopalModifiers
ErgSemantics(2f)Fundamentals
ErgProcessing(2f)ExportExample
Deepbank
ClarinoTop
ClarinoTop(2f)TechnologySurvey
ClarinoTop(2f)RequirementsSurvey
ClarinoTop(2f)RelatedWork
BarcelonaWishlist
4(2d)16_Meeting_Notes
emilymbender commented 3 years ago

I see ErgSemantics(2f)NonScopalModifiers there, confirming our decision to delete it in the github wiki.

arademaker commented 3 years ago

ah, I now see your point @emilymbender. my https://github.com/delph-in/docs/issues/18#issuecomment-922029311 was wrong (I just edited). The page ErgSemantics(2f)RelativeClauses was renamed to ErgSemantics(2f)NonScopalModifiers and this one later deleted.

arademaker commented 2 years ago

The page http://moin.delph-in.net/wiki/LkbLexDb

last edited 2011-10-08 21:12:12 by localhost

But page https://github.com/delph-in/docs/wiki/LkbLexDB

StephanOepen edited this page on Jan 13, 2009

this is very weird since the page in this wiki is older than the page in the original frozen MoinMoin installation. Contents differ too. In the dump, the current file points to version 00000009 but this page in the MoinMoin has 00000035 as the last revision.

% cat dump/ltg/moin/delphin/data/pages/LkbLexDb/current
00000009
% ls dump/ltg/moin/delphin/data/pages/LkbLexDb/revisions
00000001    00000005    00000009    00000013    00000017    00000021    00000025    00000029    00000033
00000002    00000006    00000010    00000014    00000018    00000022    00000026    00000030    00000034
00000003    00000007    00000011    00000015    00000019    00000023    00000027    00000031    00000035
00000004    00000008    00000012    00000016    00000020    00000024    00000028    00000032