dkpro / dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
https://dkpro.github.io/dkpro-core
Other
196 stars 67 forks source link

PRN and AUX are reserved names on Windows #414

Closed reckart closed 7 years ago

reckart commented 9 years ago
DKPro Core ASL cannot be checked out on Windows 7. And here is the reason: http://stackoverflow.com/questions/22769059/unable-to-checkout-using-tortoise-svn-file-already-exists-error-for-two-file/22771580#22771580

So constituent type PRN.java should be renamed.

Original issue reported on code.google.com by pedrobssantos on 2014-07-01 21:56:01

reckart commented 9 years ago
There have been quite some people working on DKPro Core under Windows and nobody ever
complained about this. What SVN client do you use?

Original issue reported on code.google.com by richard.eckart on 2014-07-02 04:47:15

reckart commented 9 years ago
1.8.8

Original issue reported on code.google.com by pedrobssantos on 2014-07-02 07:43:05

reckart commented 9 years ago
Which client? TortoiseSVN, Eclipse (pure java or JavaHL?), svn command line, ...?

Original issue reported on code.google.com by richard.eckart on 2014-07-02 07:44:23

reckart commented 9 years ago
I just tried to create a file called prn.java and it was not possible.

http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words

Original issue reported on code.google.com by pedrobssantos on 2014-07-02 07:45:15

reckart commented 9 years ago
Command line. But I really don't think that is the case. Anyway, I can post on the developers
list asking if someone also have the same the problem if you want to.

Original issue reported on code.google.com by pedrobssantos on 2014-07-02 07:59:20

reckart commented 9 years ago
And the same problem occurs with tortoise, subclipse, and etc.

Original issue reported on code.google.com by pedrobssantos on 2014-07-02 08:00:14

reckart commented 9 years ago
I can confirm this error (just decided to also use Windows for development from now
on - due to a separate reason)

Checkout error; svn: Cannot create new file 'C:\Users\Eckle-Kohler\workspace\maven.1404288938825\de.tudarmstadt.ukp.dkpro.core.api.syntax-asl\src\main\java\de\tudarmstadt\ukp\dkpro\core\api\syntax\type\constituent\.svn\tmp\text-base\PRN.java.svn-base'

(Windows 7, checkout in eclipse via Subclipse integration)

Original issue reported on code.google.com by eckle.kohler on 2014-07-02 08:22:50

reckart commented 9 years ago
If we rename this class, there is a risk of breakage because existing CASes may no longer
be readable, e.g. in CSniper. We can add this to the list of things to change when
we restructure the type system (actually I think it covered by issue 79), but before
that, I'd prefer not to touch it. Until then, you can check out everything on Windows
except the syntax module - which you would have to draw in as a Maven dependency. If
the PRN.class is in a JAR, it should not be a problem.

Original issue reported on code.google.com by richard.eckart on 2014-07-02 08:28:25

reckart commented 9 years ago
(No text was entered with this change)

Original issue reported on code.google.com by richard.eckart on 2014-08-06 20:19:04

reckart commented 9 years ago
this issue appears not to exist anymore

just checked out trunk of both DKPro Core asl and gpl under Windows 7 - everything
checks out and builds fine

performed checkout in eclipse (Luna Release (4.4.0)) via Subclipse integration

Original issue reported on code.google.com by eckle.kohler on 2014-11-19 09:23:29

reckart commented 9 years ago
issue persists - must have been a fata morgana when I tried it last time

Original issue reported on code.google.com by eckle.kohler on 2014-11-24 19:05:32

logological commented 8 years ago

Would it be possible to duplicate the PRN class (giving the duplicate a different name such as, say, PRNx), update all references to PRN to PRNx, mark the original as @deprecated, and then delete the original PRN two releases from now? This will give maintainers of other programs advance warning of the new class name and time to update their code.

A potential problem with changing the name of the class is that there are an awful lot of string references to "PRN" in the code. If any of these strings are dynamically mapped to class names, we'll need to handle "PRNx" -> PRN as a special case.

reckart commented 8 years ago

It would be nice to solve this problem as part of migrating to some universal phrase tag set (consituents) #268.

reckart commented 8 years ago

@pkluegl :)

pkluegl commented 8 years ago

This is not only PRN I think, but also AUX?

https://en.wikipedia.org/wiki/Filename

In addition, in Windows and DOS utilities, some words are also reserved and cannot be used as filenames.[17] For example, DOS device files:[19]

CON, PRN, AUX, CLOCK$, NUL COM1, COM2, COM3, COM4 LPT1, LPT2, LPT3, LPT4 (LPT4 only in some versions of DR-DOS) LST (only in 86-DOS and DOS 1.xx) KEYBD$, SCREEN$ (only in multitasking MS-DOS 4.0) $IDLE$ (only in Concurrent DOS 386, Multiuser DOS and DR DOS 5.0 and higher) CONFIG$ (only in MS-DOS 7.0-8.0)

reckart commented 8 years ago

Grmpf.... Windows....

AUX is a UD name... http://universaldependencies.org/u/pos/AUX_.html

If we change that, we loose the 1-to-1 mapping with the UD that we wanted to have as part of https://github.com/dkpro/dkpro-core/issues/76 ...

One option would be to prefix all the POS types with e.g. UD_...

Any suggestions/comments? @logological @pkluegl @zesch @Horsmann @judithek @carschno @dkpro/core-dev

Horsmann commented 8 years ago

I think it makes sense to show which standard we are using for our POS. UD seems to become a more or less well known abbreviation - which should make it clear that we use the universal dependencies

Maybe a suffix instead of a prefix ?

pkluegl commented 8 years ago

I would prefer a prefix over a suffix (but my opinion should not be weighted too high as I am not an active DKPro Core user). We use the type of annotation as prefix, not the standard, e.g., the type would be something like POSTagAux

reckart commented 8 years ago

@pkluegl aren't such verbose names inconvenient when writing Ruta rules?

pkluegl commented 8 years ago

Yes, they are :-) This was not a proposal, just sharing alternatives. On the other hand, same short names for different packages (as in DKPro) is also not optimal for Ruta.

reckart commented 8 years ago

Well, the constituents tags need to be cleaned up anyway. I don't know if there is an overlap between UD dependencies and UD pos tags.

So just to throw a few additional naming schemes here:

pkluegl commented 8 years ago

$AUX: ahhh... not good. After all problems there might be, at least Ruta won't work with this. pAUX: I'd rather stay with the convention that types start with a upper-case letter P_AUX: P is not a good abbreviation for pos in my opinion, not really a big advantage over POS_AUX

POS_AUX, UD_AUX, AUX_UD would be ok I think, preference in this order (descreasing).

Is the capitalization important for you? Camel-case is something better for auto-completion.

(Just wanted to mention that Ruta can handle the same short names, but you need aliasing or something when importing the typesystem packages.)

reckart commented 8 years ago

Regarding capitalization: as mentioned before, we wanted to be in sync with the UD POS tags, and they are all capitalized: http://universaldependencies.org/u/pos/index.html

pkluegl commented 8 years ago

Ah ok, I thought they would also be identifiable if the case was changed for some chars :-D nevermind

reckart commented 8 years ago

Well, in principle yes - I would like to keep the extract logic going from Windows-safe-names to UD names as minimal as possible. Removing/adding an affix is an unambiguous substring operation. Changing the case is not necessarily unambiguous and is simple yet another operation that I'd prefer to avoid.

judithek commented 8 years ago

@reckart there are overlaps between the universal POS tags and the universal dependencies, e.g. POS: AUX, see http://universaldependencies.org/u/pos/AUX_.html (here the mouseover text says: u-pos AUX) dependency: aux see http://universaldependencies.org/u/dep/aux_.html (the mouseover text says: u-dep aux)

another example: conj

logological commented 8 years ago

I think there is no all-around best solution, though if an affix is to be introduced, it's probably better to be a prefix.

ferschke commented 7 years ago

Moin moin, are there plans to address this issue before 1.9.0 is released? Just curious, because this was mentioned to me by a co-worker and I told him I'd ask. For some reason, our dev machines are all Windows (but usually have a Linux VM on them, so we could somehow make it work).

reckart commented 7 years ago

@ferschke Care to contribute?

ferschke commented 7 years ago

I could look into this, but from the discussion it seems there are still quite a couple of decisions to be made how to do this.

reckart commented 7 years ago

So I tend towards using POS_AUX as a schema for the type names then.

reckart commented 7 years ago

We have a Jenkins job now that tries compiling DKPro Core on Windows. Of course it fails, but with the new job, we have a better way to noticing these failures and fixing them.

pkluegl commented 7 years ago

I'll take a look at the types.

pkluegl commented 7 years ago

Ok, I'll have several questions.

Here's the first one: Should the tweet POS tags also get a POS prefix?

reckart commented 7 years ago

I guess so.

reckart commented 7 years ago

I suppose we'd end with prefixes for any type of POS, Dependency and Constituent subclasses like:

rezahay commented 7 years ago

Dear Richard, Is this issue regarding build-failure on windows (10)? I just tried to build 1.9 but I get the following error:

[ERROR] Failed to execute goal org.apache.uima:jcasgen-maven-plugin:2.10.0:generate (default) on project de.tudarmstadt.ukp.dkpro.core.api.lexmorph-asl: JCasGen: Input/Output exception occurred. [ERROR] Exception: D:\Workspaces\GitRepository\dkpro-core\dkpro-core-api-lexmorph-asl\target\generated-sources\jcasgen\de\tudarmstadt\ukp\dkpro\core\api\lexmorph\type\pos\AUX.java (The system cannot find the file specified)

Regards, Reza

pkluegl commented 7 years ago

@rezahay , yes this problem is covered by this issue.

pkluegl commented 7 years ago

I renamed the POS types now and fixed most of the tests. There are still many other test failures on my machine which could cover some missing renamings. My pull request waits for the ICLA answer.

What about the types for the syntax annotations? The correspnding artifact is building without problems since the customized (with PRN) is excluded.

pkluegl commented 7 years ago

I mean is it worth renaming AUX0 to DEP_AUX when it means we are renaming about 58-87 types, and there is no urgent need (as of we can build with windows)

reckart commented 7 years ago

Well... do you want to be able to do a full build on windows or not? Do you think it would be sufficient to build only the non-type modules?

pkluegl commented 7 years ago

... of course a full build. The thing is: I can build api.syntax-asl

There are some other modules like fs.hdfs-asl which could cause some troubles for windows but I would not cover them here.

reckart commented 7 years ago

You can build api.syntax-asl because somebody back in the day already hit the problem that AUX was a reserved name on Windows and changed it to AUX0. This was a long time ago.

Then the Universal POS tags and Universal Dependencies came into existence and we had a few discussions around switching from "DKPro Core specific" coarse-grained tags to commonly accepted coarse grained tags:

This brought back the problem that tag names clashed with Windows device names.

Then we had some discussion around how to resolve such clashes. I was in favor of doing it in a way that affects all tags equally, e.g. adding a prefix or suffix. The benefit would be that we could arrive at the proper coarse-grained tag simply by stripping the affix - and it would work in the same for all tags - no special treatment for individual tags.

Am I understanding correctly that you would be in favor of an approach that would introduce exceptions only for specific tags?

pkluegl commented 7 years ago

Nope, it depends on the perspective:

Afterall, its about when 1.9.0 should be released and how much time is available to do the changes.

pkluegl commented 7 years ago

What is the status of #1088, #516 and #517, and what is there impact on this issue?

1088 is half way done, right? What about the other two? When we rename all syntax types, will they be removed as a next step?

reckart commented 7 years ago

1088 would be kind of re-started by the present issue because most of the steps already taken in #1088 are affected.

517 and #516 haven't had too much attention yet. I suppose the two don't have an immediate impact on the present issue because the api.sytnax module appears to be building on Windows. Their relation to the present issue is that addressing #517 and #516 are also kind of introducing a "breaking" type system change.

Btw. the present issue is introducing a "more breaking" change because we actually have to delete the POS types with the names that conflict with Windows device names. Thus, by default DKPro Core 1.9.0 would no longer be able to load XMIs annotated with 1.8.0 POS type names.

reckart commented 7 years ago

While looking at #1090, I noticed that the coarseValue of the POS annotation is now often (if not always) set to POS_something. However, the coarseValue should not include the POS_ prefix. That's something that needs to be changed everywhere where the coarseValue is set.

reckart commented 7 years ago

Usually we currently use this code:

                posAnno.setCoarseValue(posAnno.getClass().equals(POS.class) ? null
                        : posAnno.getType().getShortName().intern());

That needs to be changed to strip the POS_ prefix.