Open emilymbender opened 3 years ago
I have manually copied and converted to markdown the page https://github.com/delph-in/docs/wiki/CambridgeSEM-I. You are right, my code should have missed that page because of the character that looks like an hyphen in the name or because in MoinMoin we had two very similar pages:
CambridgeSEM\(28\)2d\(29\)I/ CambridgeSEM\(2d\)I/
The first was deleted, so maybe I could have made something wrong! thank you for open the issue. I fixed the links in https://github.com/delph-in/docs/wiki/CambridgeSchedule and https://github.com/delph-in/docs/wiki/RmrsDiscussions (needs some edition to improve format).
and also see if maybe there are others?
Not sure at this stage how I can check that. In the dump from MoinMoin I have 1266 pages:
% ls | wc -l
1266
In the current wiki we have 1057 pages:
% ls | wc -l
1057
But many MoinMoin pages were intentionally removed:
We do have some weird names in the new wiki, but the content looks right:
% ls | rg "[^a-zA-Z0-9_.]"
CambridgeSEM-I.md
LtgOslo_Hank(c3b8).md
LtgOslo_Hank(c3b8)Retreat.md
MatrixDoc_Nominalized(20)Clauses.md
Saabr(c3bc)ckenTop.md
Saarbr(c3bc)ckenTop.md
SideSt(c3b8)rrelse.md
Singapore(20)Top.md
TheAbbey_Chrysalis2014PpAttachment(5d).md
ToolsTop_converter(2e)html.md
Usability_ease(20)of(20)set(2d)up.md
In the MoinMoin dump we have
% ls | rg "[^a-zA-Z0-9_.]"
(28)c396(29)nskadeSidor
(28)c396(29)vergivnaSidor
(28)c398(29)nskedeSider
(c396)nskadeSidor
(c396)vergivnaSidor
(c398)nskedeSider
4(28)2d(29)16_Meeting_Notes
4(2d)16_Meeting_Notes
Aktuelle(28)c384(29)nderungen
Aktuelle(c384)nderungen
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar
Anv(c3a4)ndarInst(c3a4)llningar
CambridgeSEM(28)2d(29)I
CambridgeSEM(2d)I
ChangementsR(28)c3a9(29)cents
ChangementsR(c3a9)cents
ClarinoTop(2f)RelatedWork
ClarinoTop(2f)RequirementsSurvey
ClarinoTop(2f)TechnologySurvey
Climb(2f)GClimb
Climb(2f)GClimb(2f)German
DeepBank(2f)OneOne
DeepBank(2f)OneZero
DelphinTutorial(2f)Distributions
DelphinTutorial(2f)Formalisms
DelphinTutorial(2f)Grammars
DelphinTutorial(2f)Processing
ErgProcessing(2f)ExportExample
ErgProcessing(2f)SampleExport
ErgSemantics(2f)Apposition
ErgSemantics(2f)Basics
ErgSemantics(2f)Ccs
ErgSemantics(2f)Comparatives
ErgSemantics(2f)Compounding
ErgSemantics(2f)Conditionals
ErgSemantics(2f)ControlRelations
ErgSemantics(2f)Conventions
ErgSemantics(2f)Coordination
ErgSemantics(2f)Design
ErgSemantics(2f)Discovery
ErgSemantics(2f)Ellipsis
ErgSemantics(2f)Essence
ErgSemantics(2f)ForeignExpressions
ErgSemantics(2f)Fragments
ErgSemantics(2f)Fundamentals
ErgSemantics(2f)HowToCite
ErgSemantics(2f)IdentityCopulae
ErgSemantics(2f)Imperatives
ErgSemantics(2f)ImplicitLocatives
ErgSemantics(2f)ImplicitNominals
ErgSemantics(2f)ImplicitQuantifiers
ErgSemantics(2f)InstrumentalRelatives
ErgSemantics(2f)Interface
ErgSemantics(2f)Internals
ErgSemantics(2f)Inventory
ErgSemantics(2f)MeasurePhrases
ErgSemantics(2f)Nominalization
ErgSemantics(2f)NonAdverbialClausalModifiers
ErgSemantics(2f)NonScopalModifiers
ErgSemantics(2f)Notes
ErgSemantics(2f)NumberSequences
ErgSemantics(2f)Parentheticals
ErgSemantics(2f)Partitives
ErgSemantics(2f)Predicates
ErgSemantics(2f)PropositionalArguments
ErgSemantics(2f)Quantification
ErgSemantics(2f)QuasiModalInfinitivals
ErgSemantics(2f)RelationalNouns
ErgSemantics(2f)RunOnConstruction
ErgSemantics(2f)Template
ErgSemantics(2f)Terminology
ErgSemantics(2f)TimeExpressions
ErgSemantics(2f)ToDo
ErgSemantics(2f)Vocatives
ErgTokenization(2f)ComplexExample
EventStats(2f)HitCounts
EventStats(2f)UserAgents
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor
FeforPlenum(2f)Formalism
FeforPlenum(2f)LexicalAcquisitionImmatureGrammars
For(28)c3a6(29)ldrel(28)c3b8(29)seSider
For(c3a6)ldrel(c3b8)seSider
ItsdbTreebanking(2f)ItsdbAnnotation
ItsdbTreebanking(2f)ItsdbExporting
ItsdbTreebanking(2f)ItsdbModeling
ItsdbTreebanking(2f)ItsdbTrouble
ItsdbTreebanking(2f)ItsdbUpdating
ItsdbTsdb(2f)ProcessingRelations
JimWhite(2f)StarSemTokenTabulation
KyotoSchedule(2f)InterDelphinNotes
KyotoTop(2f)InterWiki
LapDevelopment(2f)Abel
LapDevelopment(2f)Accounting
LapDevelopment(2f)Annotations
LapDevelopment(2f)Blog
LapDevelopment(2f)DKPROCompilation
LapDevelopment(2f)Deployment
LapDevelopment(2f)DkPro
LapDevelopment(2f)Environment
LapDevelopment(2f)Giellatekno
LapDevelopment(2f)Hackathons
LapDevelopment(2f)Internals
LapDevelopment(2f)Library
LapDevelopment(2f)MongoDB
LapDevelopment(2f)Production
LapDevelopment(2f)Schedule
LapDevelopment(2f)ServerDeployment
LapDevelopment(2f)SeverDeployment
LapDevelopment(2f)Status
LapDevelopment(2f)Tasks
LapDevelopment(2f)Tests
LapDevelopment(2f)ToE
LapDevelopment(2f)Tree
LogonInstallation(2f)CvsBasics
LogonInstallation(2f)InstallationBasics
LogonMrs(2f)InformationStructure
LogonMrs(2f)MessageRelations
LogonProcessing(2f)BatchGeneration
LogonProcessing(2f)BatchParsing
LogonProcessing(2f)BatchTranslation
LogonTest(2f)BenchmarkingSuite
LtgOslo(2f)BibTeX
LtgOslo(2f)Cristin
LtgOslo(2f)Delphin
LtgOslo(2f)EndreAalrust
LtgOslo(2f)Goals
LtgOslo(2f)Hank(28)c3b8(29)
LtgOslo(2f)Hank(c3b8)
LtgOslo(2f)LaTeX
LtgOslo(2f)Linux
LtgOslo(2f)MSc
LtgOslo(2f)MajaBuljan
LtgOslo(2f)MarteSvalastoga
LtgOslo(2f)Norsk
LtgOslo(2f)Oscarsborg
LtgOslo(2f)TechTalks
LtgOslo(2f)TechTalks16
LtgOslo(2f)TechTalksH2016
LtgOslo(2f)WorkDuties
MatrixDoc(2f)AdnominalPossession
MatrixDoc(2f)ArgumentOptionality
MatrixDoc(2f)Case
MatrixDoc(2f)ClausalComplements
MatrixDoc(2f)ClausalModifiers
MatrixDoc(2f)Coordination
MatrixDoc(2f)DirectInverse
MatrixDoc(2f)Evidentials
MatrixDoc(2f)Gender
MatrixDoc(2f)GeneralInfo
MatrixDoc(2f)ImportToolboxLexicon
MatrixDoc(2f)InformationStructure
MatrixDoc(2f)Lexicon
MatrixDoc(2f)Morphology
MatrixDoc(2f)NominalizedClauses
MatrixDoc(2f)Number
MatrixDoc(2f)OtherFeatures
MatrixDoc(2f)Person
MatrixDoc(2f)SententialNegation
MatrixDoc(2f)TenseAspectMood
MatrixDoc(2f)TestByGeneration
MatrixDoc(2f)TestSentences
MatrixDoc(2f)WhQ
MatrixDoc(2f)WordOrder
MatrixDoc(2f)YesNoQ
MoinMoin(2f)InstallDocs
MoinMoin(2f)InstallationsAnleitung
MoinMoin(2f)TextFormatting
MtJaen(2f)MtJaenTanaka
OpenissuesTop(2f)GrammarMatrixClitic
OpenissuesTop(2f)GrammarMatrixSerialVerbConstructions
OpenissuesTop(2f)GrammarMatrixTenseAspect
PageAl(28)c3a9(29)atoire
PageAl(c3a9)atoire
PagesAbandonn(28)c3a9(29)es
PagesAbandonn(c3a9)es
PagesSouhait(28)c3a9(29)es
PagesSouhait(c3a9)es
PhonologyTop(2f)FrenchPhonemes
PhonologyTop(2f)InterWiki
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur
Pr(c3a9)f(c3a9)rencesUtilisateur
S(28)c3b6(29)kSida
S(c3b6)kSida
SeitenGr(28)c3b6c39f(29)e
SeitenGr(c3b6c39f)e
Senaste(28)c384(29)ndringar
Senaste(c384)ndringar
SideSt(28)c3b8(29)rrelse
SideSt(c3b8)rrelse
Singapore(20)Top
Singapore(28)20(29)Top
SynSem(2f)Activities
SynSem(2f)Activities(2f)AnnotationConsistency
SynSem(2f)Activities(2f)ControlRaising
SynSem(2f)Activities(2f)Coordination
SynSem(2f)Activities(2f)DependentDimensions
SynSem(2f)Activities(2f)ExtrinsicParserEvaluation
SynSem(2f)Activities(2f)Gapping
SynSem(2f)Activities(2f)GramRel
SynSem(2f)Activities(2f)IdentitySyntax
SynSem(2f)Activities(2f)PcdrtEllipsis
SynSem(2f)Activities(2f)PcdrtEllipsis(2f)10Oct2017
SynSem(2f)Activities(2f)PcdrtEllipsis(2f)25Sept2017
SynSem(2f)Activities(2f)PolymorphicVariadicPredicates
SynSem(2f)Activities(2f)UdMeaningConstruction
SynSem(2f)Candidates
SynSem(2f)Impressions
SynSem(2f)Launch
SynSem(2f)LysebuResources
SynSem(2f)MeaningConstruction
SynSem(2f)MeaningRepresentation
SynSem(2f)Planning
SynSem(2f)Problems
SynSem(2f)Problems(2f)ERGQuantification
SynSem(2f)Problems(2f)ScopalNonScopal
SynSem(2f)Problems(2f)UDDeterminers
TheAbbey(2f)Chrysalis2014
TheAbbey(2f)Chrysalis2014Arity
TheAbbey(2f)Chrysalis2014BindingTheory
TheAbbey(2f)Chrysalis2014DeverbalNouns
TheAbbey(2f)Chrysalis2014Nominalization
TheAbbey(2f)Chrysalis2014OpenEndedPredicates
TheAbbey(2f)Chrysalis2014PossessiveIdioms
TheAbbey(2f)Chrysalis2014PpAttachment
TheAbbey(2f)Chrysalis2014ProperNouns
TheAbbey(2f)Chrysalis2014ProperNounsGeneration
TheAbbey(2f)Chrysalis2014SchrodingerMrs
TheAbbey(2f)Chrysalis2014Terminology
TheAbbey(2f)Chrysalis2014WhatsThePoint
Tilf(28)c3a6(29)ldigSide
Tilf(c3a6)ldigSide
ToolsTop(2f)converter(28)2e(29)html
ToolsTop(2f)converter(2e)html
Tu(28)e1baa5(29)nAnhL(28)c3aa(29)
Tu(e1baa5)nAnhL(c3aa)
TuanAnhLe(2f)GramEng4Dummies
WeSearch(2f)Adaptation
WeSearch(2f)Adaptation(2f)Background
WeSearch(2f)AnalysisCatalog
WeSearch(2f)Berlin
WeSearch(2f)Ccs
WeSearch(2f)CcsDayOne
WeSearch(2f)CcsDayThree
WeSearch(2f)CcsDayTwo
WeSearch(2f)ChartPruning
WeSearch(2f)DataCollection
WeSearch(2f)Demonstrator
WeSearch(2f)DescriptiveStatistics
WeSearch(2f)DesignPrinciples
WeSearch(2f)DocumentParsing
WeSearch(2f)FeforTopics
WeSearch(2f)Hank(28)c3b8(29)Schedule
WeSearch(2f)Hank(28)c3b8(29)TheRest
WeSearch(2f)Hank(c3b8)Schedule
WeSearch(2f)Hank(c3b8)TheRest
WeSearch(2f)ICONS
WeSearch(2f)Interface
WeSearch(2f)LexicalFiltering
WeSearch(2f)ParserAdaptation
WeSearch(2f)ParserEvaluation
WeSearch(2f)PestExamples
WeSearch(2f)QueryLanguage
WeSearch(2f)Rdf
WeSearch(2f)ReadingGroup
WeSearch(2f)RealisticTextParsing
WeSearch(2f)Resa
WeSearch(2f)ScopalArgCoord
WeSearch(2f)SentenceSegmentation
WeSearch(2f)StarSem
WeSearch(2f)StarSem(2f)MrsCrawling
WeSearch(2f)StarSem(2f)MrsCrawlingEvaluation
WeSearch(2f)StarSem(2f)MrsCrawlingOracle
WeSearch(2f)StarSem(2f)MrsReadingGroup
WeSearch(2f)StarSem(2f)UiO
WeSearch(2f)SuperTagging
WeSearch(2f)SuperTagging(2f)Setup
WeSearch(2f)Tokenization
WeSearch(2f)TripleStores
WeSearch(2f)UberTagging
WeSearch(2f)UnderspecifedAttachment
WeSearch(2f)UnderspecifiedPreds
WeSearch(2f)VariablePropertySharing
WikiSandL(28)c3a5(29)da
WikiSandL(c3a5)da
https(3a2f2f)students(2e)washington(2e)edu(2f)olzama(2f)ge
venue(28)2d(29)map(28)2e(29)png
venue(2d)map(2e)png
But some are garbage in MoinMoin, see the last two. The content is an image. Many pages were correctly imported by renamed from MatrixDoc(2f)Lexicon
to https://github.com/delph-in/docs/wiki/MatrixDoc_Lexicon (because MoinMoin had support to subpages). Many pages under WeSearch prefixed were protected and not imported.
One more case similar to the CambridgeSEM-I page:
ToolsTop(2f)converter(28)2e(29)html
ToolsTop(2f)converter(2e)html
I have just manually create https://github.com/delph-in/docs/wiki/ToolsTop_converter.
Pages
LtgOslo(2f)Hank(28)c3b8(29)
LtgOslo(2f)Hank(c3b8)
the first was deleted, the second is protected in MoinMoin. So I removed them from here:
0e28907d5461412626da14ea103680c31f7ea951 (HEAD -> master, origin/master) Destroyed LtgOslo_Hank(c3b8)Retreat (markdown)
:100644 000000 4df770ab 00000000 D LtgOslo_Hank(c3b8)Retreat.md
63049d2ec3ed739c829debb7623cb210ca533027 Destroyed LtgOslo_Hank(c3b8) (markdown)
:100644 000000 4df770ab 00000000 D LtgOslo_Hank(c3b8).md
Help needed! Can someone see any important page in the lists above that is not in the current wiki?
Pages
Saabr(c3bc)ckenTop.md
Saarbr(c3bc)ckenTop.md
were duplicated (related to #25), I fixed the name and merged the contents in https://github.com/delph-in/docs/wiki/SaabruckenTop.
I think all those (2f)
kinds of things are when whatever converter you used tried to escape the punctuation. They are hexadecimal values for ASCII characters (illustrated in Python (sorry) below):
>>> chr(int('2d', 16)) # convert base-16 int to character
'-'
>>> chr(int('2f', 16))
'/'
Although the one for SaarbrückenTop is strange:
>>> chr(int('c3bc', 16))
'쎼'
>>> hex(ord('ü')) # going the other way
'0xfc'
Then the CambridgeSEM\(28\)2d\(29\)I/
vs CambridgeSEM\(2d\)I/
thing is because those escapes were, themselves, escaped:
>>> chr(int('28', 16))
'('
>>> chr(int('29', 16))
')'
It looks like all the ones with only (2f) (/
) used _
instead and are imported already. The ones with dashes (CambridgeSEM-I
) are presented in the browser with the dash as a space (see here). With this in mind I whittled down your list a bit. I don't have the Moin dump so I copied your file list above as moin.txt
, then I created two normalized lists of files like this:
$ cat moin.txt | sed -e 's/(2f)/_/g' -e 's/(2d)/-/g' -e 's/$/.md/' > moin-norm.txt
$ ls | grep "[^a-zA-Z0-9.]" | sort > current.txt
Then I can find which ones are not already ported:
$ comm -2 -3 moin-norm.txt current.txt # find lines in common, only show unique in moin-norm.txt
It produces the following list, which I have manually sorted and annotated:
# System pages (I'm just guessing for the non-English titles)
(28)c396(29)nskadeSidor.md
(28)c396(29)vergivnaSidor.md
(28)c398(29)nskedeSider.md
Aktuelle(28)c384(29)nderungen.md
Aktuelle(c384)nderungen.md
Anv(28)c3a4(29)ndarInst(28)c3a4(29)llningar.md
Anv(c3a4)ndarInst(c3a4)llningar.md
(c396)nskadeSidor.md
(c396)vergivnaSidor.md
(c398)nskedeSider.md
ChangementsR(28)c3a9(29)cents.md
ChangementsR(c3a9)cents.md
F(28)c3b6(29)r(28)c3a4(29)ldrarl(28)c3b6(29)saSidor.md
F(c3b6)r(c3a4)ldrarl(c3b6)saSidor.md
For(28)c3a6(29)ldrel(28)c3b8(29)seSider.md
For(c3a6)ldrel(c3b8)seSider.md
MoinMoin_InstallationsAnleitung.md
MoinMoin_InstallDocs.md
MoinMoin_TextFormatting.md
PageAl(28)c3a9(29)atoire.md
PageAl(c3a9)atoire.md
PagesAbandonn(28)c3a9(29)es.md
PagesAbandonn(c3a9)es.md
PagesSouhait(28)c3a9(29)es.md
PagesSouhait(c3a9)es.md
Pr(28)c3a9(29)f(28)c3a9(29)rencesUtilisateur.md
Pr(c3a9)f(c3a9)rencesUtilisateur.md
S(28)c3b6(29)kSida.md
S(c3b6)kSida.md
SeitenGr(28)c3b6c39f(29)e.md
SeitenGr(c3b6c39f)e.md
Senaste(28)c384(29)ndringar.md
Senaste(c384)ndringar.md
SideSt(28)c3b8(29)rrelse.md
Tilf(28)c3a6(29)ldigSide.md
Tilf(c3a6)ldigSide.md
WikiSandL(28)c3a5(29)da.md
WikiSandL(c3a5)da.md
# Personal pages or accidental (?) pages
https(3a2f2f)students(2e)washington(2e)edu_olzama_ge.md
LtgOslo_Cristin.md
Tu(28)e1baa5(29)nAnhL(28)c3aa(29).md
Tu(e1baa5)nAnhL(c3aa).md
venue(28)2d(29)map(28)2e(29)png.md
venue-map(2e)png.md
Singapore(28)20(29)Top.md # see SingaporeTop
# Other duplicates from bad escaping
4(28)2d(29)16_Meeting_Notes.md
CambridgeSEM(28)2d(29)I.md
LtgOslo_Hank(28)c3b8(29).md
ToolsTop_converter(28)2e(29)html.md
WeSearch_Hank(28)c3b8(29)Schedule.md
WeSearch_Hank(28)c3b8(29)TheRest.md
# Potentially good pages; some already converted
4-16_Meeting_Notes.md
ClarinoTop_RelatedWork.md
ClarinoTop_RequirementsSurvey.md
ClarinoTop_TechnologySurvey.md
ErgProcessing_ExportExample.md
ErgSemantics_Fundamentals.md
ErgSemantics_NonScopalModifiers.md
ErgSemantics_RunOnConstruction.md
ItsdbTreebanking_ItsdbTrouble.md
KyotoTop_InterWiki.md
LapDevelopment_Abel.md
LapDevelopment_SeverDeployment.md
LapDevelopment_Tasks.md
LogonInstallation_CvsBasics.md
LogonInstallation_InstallationBasics.md
LogonMrs_InformationStructure.md
LogonMrs_MessageRelations.md
LtgOslo_Hank(c3b8).md # LtgOslo/Hankø
MatrixDoc_WhQ.md
ToolsTop_converter(2e)html.md # wiki actually had ".html" in the title; already imported as ToolsTop_converter
WeSearch_Berlin.md
WeSearch_Demonstrator.md
WeSearch_FeforTopics.md
WeSearch_Hank(c3b8)Schedule.md # WeSearch/HankøSchedule
WeSearch_Hank(c3b8)TheRest.md # WeSearch/HankøTheRest
WeSearch_Interface.md
WeSearch_PestExamples.md
WeSearch_RealisticTextParsing.md
WeSearch_StarSem_MrsCrawling.md
WeSearch_SuperTagging.md
WeSearch_Tokenization.md
WeSearch_TripleStores.md
WeSearch_UberTagging.md
Thank you @goodmami , yes /
were converted to _
and -
GitHub magically translates to space. The parenthesis are agly but do not cause any harm. But the reason for duplications (see #25) is still not clear to me. Some duplications are already in the dump, so not an error in the migration. The encoding may have caused some error in the migration but we now have a list. I am attaching the list of all pages in the dump that I got from @oepen:
As you noticed, many of the cases above I already fixed.
(edited)
The case of ErgSemantics_NonScopalModifiers.md
is interesting. It looks like an important page that we may have lost, but http://moin.delph-in.net/wiki/ErgSemantics/NonScopalModifiers. This page was deleted in MoinMoin. Actually, it was ErgSemantics(2f)RelativeClauses
renamed to ErgSemantics(2f)NonScopalModifiers
and later deleted:
See ErgSemantics\(2f\)NonScopalModifiers/edit-log
:
1382547133360546 00000001 SAVENEW ErgSemantics(2f)RelativeClauses 75.146.63.242 75-146-63-242-Washington.hfc.comcastbusiness.net 1101511421.47.55017
1382549060521974 00000002 SAVE ErgSemantics(2f)RelativeClauses 75.146.63.242 75-146-63-242-Washington.hfc.comcastbusiness.net 1101511421.47.55017
1382549150509747 00000003 SAVE ErgSemantics(2f)RelativeClauses 75.146.63.242 75-146-63-242-Washington.hfc.comcastbusiness.net 1101511421.47.55017
1382602650699190 00000004 SAVE ErgSemantics(2f)RelativeClauses 93.206.0.159 p5DCE009F.dip0.t-ipconnect.de 1098876287.95.17133
1405018957939738 00000005 SAVE ErgSemantics(2f)RelativeClauses 87.162.226.112 p57A2E270.dip0.t-ipconnect.de 1098876287.95.17133
1415232274472863 00000006 SAVE ErgSemantics(2f)RelativeClauses 75.146.63.242 75-146-63-242-Washington.hfc.comcastbusiness.net 1101511421.47.55017
1433437370469623 00000007 SAVE ErgSemantics(2f)RelativeClauses 75.146.63.242 75-146-63-242-Washington.hfc.comcastbusiness.net 1101511421.47.55017
1450478308547806 00000008 SAVE ErgSemantics(2f)RelativeClauses 174.21.159.201 174-21-159-201.tukw.qwest.net 1101511421.47.55017 A first attempt at talking about intersective modification as a `phenomenon'
1450478469201052 00000009 SAVE ErgSemantics(2f)RelativeClauses 174.21.159.201 174-21-159-201.tukw.qwest.net 1101511421.47.55017 Noting references I haven't yet looked through
1450733435563410 00000010 SAVE/RENAME ErgSemantics(2f)NonScopalModifiers 193.157.186.127 1x-193-157-186-127.uio.no 1098876287.95.17133 ErgSemantics/RelativeClauses per ESD decision
1450734224521613 00000011 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.159.201 174-21-159-201.tukw.qwest.net 1101511421.47.55017
1453307751475650 00000012 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.160.48 174-21-160-48.tukw.qwest.net 1101511421.47.55017 Typographic conventions
1453307888518481 00000013 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.160.48 174-21-160-48.tukw.qwest.net 1101511421.47.55017
1453308154249605 00000014 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.160.48 174-21-160-48.tukw.qwest.net 1101511421.47.55017 Revise to fully embrace ‘non-scopal modifiers’ as the phenomenon name
1453308395419146 00000015 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.160.48 174-21-160-48.tukw.qwest.net 1101511421.47.55017 Further edits based on notes from last ESD meeting
1453308624544911 00000016 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.160.48 174-21-160-48.tukw.qwest.net 1101511421.47.55017 And one last thing
1453318301298743 00000017 SAVE ErgSemantics(2f)NonScopalModifiers 174.21.160.48 174-21-160-48.tukw.qwest.net 1101511421.47.55017
1453822209279284 00000018 SAVE ErgSemantics(2f)NonScopalModifiers 193.157.184.226 1x-193-157-184-226.uio.no 1098876287.95.17133 per request by emily
That last comment looks like one by @oepen and at a guess we decided to delete the page/merge the content elsewhere.
One more crazy page is 4-16_Meeting_Notes.md
, In the original dump I have 4\(2d\)16_Meeting_Notes/
but during the migration, I had to instantiate a local MoinMoin in a docker that was the endpoint for another script to get the contents and produce the markdown files for this new wiki. It looks like this new instance created the empty 4\(28\)2d\(29\)16_Meeting_Notes/
file.
This new file is not a big problem, it is empty and even if it generate an empty page here, we can easily delete. The original page 4\(2d\)16_Meeting_Notes/
was deleted in MoinMoin: the current version is 00000004 but the least revision with content is 00000003. But the log says nothing
1177188397000000 00000001 SAVENEW 4(2d)16_Meeting_Notes 71.35.116.39 71-35-116-39.tukw.qwest.net 1176767071.26.36927
1177188634000000 00000002 SAVE 4(2d)16_Meeting_Notes 71.35.116.39 71-35-116-39.tukw.qwest.net 1176767071.26.36927
1177188730000000 00000003 SAVE 4(2d)16_Meeting_Notes 71.35.116.39 71-35-116-39.tukw.qwest.net 1176767071.26.36927
1281376152000000 00000004 SAVE 4(2d)16_Meeting_Notes 84.208.94.211 cm-84.208.94.211.getinternet.no 1098876287.95.17133
So for me, nothing wrong here, the page does not exist in http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=fullsearch&context=180&value=notes&titlesearch=Titles, one extra clue that it was deleted. Content of the rev 0000003 looks like a draft anyway:
== slot/proto-morpheme whatchamacallit ==
'''Content'''
* morphosyntactic categories
* portmanteau
* range of values
* unhandled "dummy"
'''Context'''
* order
* dependencies
* category missing (e.g. don't mark person on infinitives)
* dependent choices (e.g. neg gets different mood)
* optionality
* easy
* multipaths
* iterability
* *fix
But them in the current wiki I found https://github.com/delph-in/docs/wiki/notes, the name is not very informative and it looks duplicated from https://github.com/delph-in/docs/wiki/OsloScopalNonScopal. But they are not identical. So I found in the git whatchanged
:
887179315996ba05848a18c2b35506eee8c4f61b Rough notes, speakers are encouraged to read & edit.
:000000 100644 00000000 dc378ff3 A notes.md
and this same message in the history of the http://moin.delph-in.net/wiki/OsloScopalNonScopal?action=info. So notes
was the initial name for OsloScopalNonScopal. It looks like the migration process had trouble with pages that were renamed during the process to retrieve the page history. As we can see in the git log
ar@tenis docs.wiki % git log --format='%H %an %s' -- notes.md
887179315996ba05848a18c2b35506eee8c4f61b EmilyBender Rough notes, speakers are encouraged to read & edit.
ar@tenis docs.wiki % git log --format='%H %an %s' -- OsloScopalNonScopal.md
c4abe4a1952ce48117e77d6a5b8dfadc2ca02f96 EmilyBender Adding notes from SIG
51056273c26c3393d306220ced63070810ee8b7e GlennSlayden Add/Update OsloScopalNonScopal.md
683a77b7585d8b4cf6e3917100b4dcc1d4d796d6 StephanOepen Add/Update OsloScopalNonScopal.md
ef4ab6cb5451ab4fbf23bc0b9fed6f9599c23241 StephanOepen per request by emily
During the process, to preserve the history of the changes, the migration process created the notes.md
. But this file was renamed and instead of delete the notes.md
, the migration just create a new file with the new name. I have delete the notes.md
now.
I realize that it was a mistake from my side to not detected all these details during the migration. I am sorry for that. But no content was lost, I do have the dump, we do have MoinMoin in ready-only mode running. I still believe that for the majority of the pages, the final result is fine. So maybe we just need to be aware of those problems and try to solve the issues as we find them?
The migration is such a huge job, @arademaker ! Thank you for taking it on.
I think that notes.md file was indeed spurious, and I see that OsloScopalNotScopal has survived the transition. It's too bad that the 'delete' actions aren't apparent (at least as far as I can tell) in the migrated data.
The deletion of notes.md
was done by me now, locally:
% pwd
/Users/ar/hpsg/documentation/docs.wiki
% git whatchanged
8a29e9c8da582a0f71793895e11b0b2eaafaf545 (HEAD -> master, origin/master) deleted file that was renamed. See #18
:100644 000000 dc378ff3 00000000 D notes.md
...
The good news is that we do have a way to know all pages in MoinMoin that we renamed:
find . -name edit-log | xargs awk '$3 ~ /RENAME/ {print FILENAME,$2,"new: " $4,"old: " $8}'
For
./SynSem(2f)Activities(2f)DependentDimensions/edit-log 00000010 new: SynSem(2f)Activities(2f)DependentDimensions old: SynSem/DependentDimensions
I just deleted the second one in the screenshot above. The old one that was renamed.
Is it possible to tell which pages were deleted during the MoinMoin days, though?
Hum, yes. For pages that are actually deleted, MoinMoin represents deletion by increasing the version number without creating an actually revision in the proper subdirectory. Each page is represented as:
% tree MatrixDocTop
MatrixDocTop
├── cache
│ └── pagelinks
├── current
├── edit-log
└── revisions
├── 00000001
├── 00000002
├── 00000003
├── 00000004
├── 00000005
├── 00000006
├── 00000007
├── 00000008
....
ar@tenis pages % cat MatrixTop/current
00000042
So if a page is deleted, the content of the current
file will be a number that does not correspond to any file in the revisions
subfolder. See http://moinmo.in/HelpOnPageDeletion
So the list of pages DELETED in MoinMoin are below. The renamed ones are not here:
venue(2d)map(2e)png
SuquamishCommunityHouse
StandingTop
StandingGroup
ShortCLIMB
PgAccess
PetEvolution
PestTop
ParisCards
ParallelCorp
MWEs_and_Idiomatic_Expressions
LogonMrs(2f)MessageRelations
LogonMrs(2f)InformationStructure
LogonInstallation(2f)InstallationBasics
LogonInstallation(2f)CvsBasics
LkbSmaf
LkbLexDbPsqlInitialize
LkbLexDbInitialize
LkbDownload
LicensingChoices
LexDbPgAccess
LexDB_Internals
LapDevelopment(2f)Tasks
LapDevelopment(2f)SeverDeployment
LapDevelopment(2f)Abel
KyotoFutureSummitSuggestions
ItsdbTreebanking(2f)ItsdbTrouble
Initialize_LexDB
ErgSemanticsTemplate
ErgSemantics(2f)RunOnConstruction
ErgSemantics(2f)NonScopalModifiers
ErgSemantics(2f)Fundamentals
ErgProcessing(2f)ExportExample
Deepbank
ClarinoTop
ClarinoTop(2f)TechnologySurvey
ClarinoTop(2f)RequirementsSurvey
ClarinoTop(2f)RelatedWork
BarcelonaWishlist
4(2d)16_Meeting_Notes
I see ErgSemantics(2f)NonScopalModifiers there, confirming our decision to delete it in the github wiki.
ah, I now see your point @emilymbender. my https://github.com/delph-in/docs/issues/18#issuecomment-922029311 was wrong (I just edited). The page ErgSemantics(2f)RelativeClauses
was renamed to ErgSemantics(2f)NonScopalModifiers
and this one later deleted.
The page http://moin.delph-in.net/wiki/LkbLexDb
last edited 2011-10-08 21:12:12 by localhost
But page https://github.com/delph-in/docs/wiki/LkbLexDB
StephanOepen edited this page on Jan 13, 2009
this is very weird since the page in this wiki is older than the page in the original frozen MoinMoin installation. Contents differ too. In the dump, the current
file points to version 00000009
but this page in the MoinMoin has 00000035
as the last revision.
% cat dump/ltg/moin/delphin/data/pages/LkbLexDb/current
00000009
% ls dump/ltg/moin/delphin/data/pages/LkbLexDb/revisions
00000001 00000005 00000009 00000013 00000017 00000021 00000025 00000029 00000033
00000002 00000006 00000010 00000014 00000018 00000022 00000026 00000030 00000034
00000003 00000007 00000011 00000015 00000019 00000023 00000027 00000031 00000035
00000004 00000008 00000012 00000016 00000020 00000024 00000028 00000032
It looks like this page didn't get imported: http://moin.delph-in.net/wiki/CambridgeSEM-I
It's world readable, so I wonder if the problem is that the page name is a bit odd (has a hyphen) and if so, if there might be other pages that weren't imported.
@arademaker can you import it and also see if maybe there are others?
It also looks like links to the page will need to be updated. I discovered it was missing by looking here:
https://github.com/delph-in/docs/wiki/RmrsDiscussions