derrickoswald / CIMSpark

Spark access to Common Information Model (CIM) files
MIT License
15 stars 1 forks source link

Documentation & ScalaDoc #13

Open derrickoswald opened 5 years ago

derrickoswald commented 5 years ago

This issue tracks the tasks around documentation in general and ScalaDoc in particular.

mbheinen commented 5 years ago

I want to try to tackle this one but need a bit of guidance in some areas.

Also, if I can't do it all we may need to open separate issues to track some of them.

derrickoswald commented 5 years ago

This is a good starter topic because someone with new eyes has questions - that could be answered with documentation. ScalaDocs should be useful to new users of the code as well as experienced users. The CHIM class is just one example that was created very early in my Scala odyssey and could use some love.

If you look at the ScalaDoc you'll see most of it is undocumented. You don't have to do everything, but maybe just the things that cause a WTF? reaction.

True, the ScalaDocs are published on github.io - I guess my ToDo list wasn't up to date.

By 'best effort' I'm referring to places where the .eap file contains lists or formatting instructions in addition to text, e.g.

This is obviously bordering on AI to achieve it fully, but a few pattern recognition utilities may make a lot of difference to readability.

Yes, it's mostly just breaking out the existing text and making it current and easy to try on your own. You should try it and fix the places that make you stumble.

For the Zeppelin connection to the sandbox see https://zeppelin.apache.org/docs/latest/interpreter/spark.html which is mostly true, and:

  • needed to add SPARK_HOME to the Spark interpreter page (form for arbitrary stuff at the bottom)
  • conf/zeppelin-env.sh:
    • export ZEPPELIN_PORT=8980 // to avoid 8080 used by Spark GUI
    • export SPARK_HOME=/somewhere/spark/spark-2.4.3-bin-hadoop2.7/
    • export SPARK_SUBMIT_OPTIONS=\" --packages com.datastax.spark:spark-cassandra-connector_2.11:2.4.1" // to access the spark-cassandra connector
    • export HADOOP_HOME=/somewhere/spark/hadoop-2.7.6
    • export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

The last two are to allow access to HDFS assuming you have that set up in core-site.xml:

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://sandbox:8020</value>
  </property>
</configuration>

Do what you can and punt the rest to another Issue.

derrickoswald commented 5 years ago

The CIMTool project needs a small correction to fix references to modified property/member names, the JavaDoc code should use the modified member name:

  • when an _attr suffix is added
  • when the property length is changed to len
  • when the property size is changed to size1
  • when a property ending in _ is changed to _1 (why?)
  • when a property contains a space replaced by _

I've sent some CIM100 Errata.pdf upstream to the CIM model manager.