dadoonet / fscrawler

Elasticsearch File System Crawler (FS Crawler)
https://fscrawler.readthedocs.io/
Apache License 2.0
1.36k stars 299 forks source link

Small bugs #1421

Open ProcopieGabi0112 opened 2 years ago

ProcopieGabi0112 commented 2 years ago

Issue Description

Good evening, @dadoonet. My name is Procopie Gabi and I decided to use a verification tool on your application for a project given by my university. I know that the bugs I found are not as consistent as the ones you are looking for but I would be happy if you could look over what I worked on.

List of bugs

. . . . .

Bug 1. Call to method of static java.text.DateFormat

Description

One of the most important things to note about SimpleDateFormat class is that it is not safe and leads to potential concurrency issues, adding additional problems to handling thread safety. 

Source

  path: elasticsearch-client/.../WorkplaceSearchClientUtil.java
WorkplaceSearchClientUtil.java

public static String toRFC3339(Date d) {
        if (logger.isDebugEnabled() && d != null) {

            String format = RFC_3339.format(d);    /* ERROR  */                                                               

            String finalDate = format.replaceAll("(\\d\\d)(\\d\\d)$", "$1:$2");
            logger.debug("Transforming date to RFC_3339 [{}] -> [{}] -> [{}]", d, format, finalDate);
        }
        return d == null ? null : RFC_3339.format(d).replaceAll("(\\d\\d)(\\d\\d)$", "$1:$2");
    } 

Solutions found

solution 1

 We can use local DateFormat or SimpleDateFormat objects for converting or formatting dates. Making them local ensure that they 
   will not be shared between calls of the function.

solution 2

 If we are sharing Date from SimpleDateFormat class then we will need to externally synchronize calls to the method  toRFC3339(Date d) because they mutate the state of SimpleDateFormat object and can create bugs while formatting Strings or creating dates.
public synchronized static String toRFC3339(Date d) {
        if (logger.isDebugEnabled() && d != null) {
            String format = RFC_3339.format(d);                                                                
            String finalDate = format.replaceAll("(\\d\\d)(\\d\\d)$", "$1:$2");
            logger.debug("Transforming date to RFC_3339 [{}] -> [{}] -> [{}]", d, format, finalDate);
        }
        return d == null ? null : RFC_3339.format(d).replaceAll("(\\d\\d)(\\d\\d)$", "$1:$2");
    } 

. . . . .

Bug 2. Incorrect lazy initialization of static field RestServer.httpServer

Description

This method contains an unsynchronized lazy initialization of a non-volatile static field. Because the compiler or processor may reorder instructions, threads are not guaranteed to see a completely initialized object, if the method can be called by multiple threads.

Source

  path: rest/.../RestServer.java
RestServer.java

    public static void start(FsSettings settings, FsCrawlerManagementService managementService, FsCrawlerDocumentService documentService) {
        // We create the service only one
        if (httpServer == null) {
            // create a resource config that scans for JAX-RS resources and providers
            // in fr.pilato.elasticsearch.crawler.fs.rest package
            final ResourceConfig rc = new ResourceConfig()
                    .registerInstances(
                            new ServerStatusApi(managementService, settings),
                            new DocumentApi(settings, documentService),
                            new UploadApi(settings, documentService))
                    .register(MultiPartFeature.class)
                    .register(RestJsonProvider.class)
                    .register(JacksonFeature.class)
                    .register(new CORSFilter(settings.getRest()))
                    .packages("fr.pilato.elasticsearch.crawler.fs.rest");

            // create and start a new instance of grizzly http server
            // exposing the Jersey application at BASE_URI
            httpServer = GrizzlyHttpServerFactory.createHttpServer(URI.create(settings.getRest().getUrl()), rc);
            logger.info("FS crawler Rest service started on [{}]", settings.getRest().getUrl());
        }
    }

Solutions found

solution 1

 If we change the definition of a member of the RestServer class namely httpServer from private static to private volatile static.
 We will notice that the bug has disappeared from the list. 
public class RestServer {

    private volatile static HttpServer httpServer = null;
    private static final Logger logger = LogManager.getLogger();

    /**
     * Starts Grizzly HTTP server exposing JAX-RS resources defined in this application.
     * @param settings FSCrawler settings
     * @param managementService The management service
     * @param documentService The document service
     */
    public static void start(FsSettings settings, FsCrawlerManagementService managementService, FsCrawlerDocumentService documentService) {
        // We create the service only one
        if (httpServer == null) {
            // create a resource config that scans for JAX-RS resources and providers
            // in fr.pilato.elasticsearch.crawler.fs.rest package
            final ResourceConfig rc = new ResourceConfig()
                    .registerInstances(

. . . . .

Bug 3. Incorrect lazy initialization of static field AbstractITCase.documentService

Description

This method contains an unsynchronized lazy initialization of a non-volatile static field. Because the compiler or processor may reorder instructions, threads are not guaranteed to see a completely initialized object, if the method can be called by multiple threads. 

Source

  path: integration-tests/.../AbstractITCase.java
@AfterClass
    public synchronized static void stopServices() throws IOException {
        staticLogger.info("Stopping integration tests against an external cluster");
        if (documentService != null) {
            documentService.close();
            documentService = null;
            staticLogger.info("Document service stopped");
        }
        if (managementService != null) {
            managementService.close();
            managementService = null;
            staticLogger.info("Management service stopped");
        }
    }

Solutions found

solution 1

 To fix this bug I think we could define function void stopServices () as synchronized.
    @AfterClass
    public synchronized static void stopServices() throws IOException {
        staticLogger.info("Stopping integration tests against an external cluster");
        if (documentService != null) {
            documentService.close();
            documentService = null;
            staticLogger.info("Document service stopped");
        }
        if (managementService != null) {
            managementService.close();
            managementService = null;
            staticLogger.info("Management service stopped");
        }
    }

Bug 4. Incorrect lazy initialization of static field AbstractITCase.managementService

Description

This method contains an unsynchronized lazy initialization of a non-volatile static field. Because the compiler or processor may reorder instructions, threads are not guaranteed to see a completely initialized object, if the method can be called by multiple threads. 

Source

  path: integration-tests/.../AbstractITCase.java
AbstractITCase.java
.
.
.
.
@AfterClass
    public static void stopServices() throws IOException {
        staticLogger.info("Stopping integration tests against an external cluster");
        if (documentService != null) {
            documentService.close();
            documentService = null;
            staticLogger.info("Document service stopped");
        }
        if (managementService != null) {
            managementService.close();
            managementService = null;
            staticLogger.info("Management service stopped");
        }
    }

Solutions found

solution 1

 To fix this bug I think we could redefine the  assignment 
protected static FsCrawlerManagementServiceElasticsearchImpl managementService;
in the assignment
protected volatile static FsCrawlerManagementServiceElasticsearchImpl managementService;
    protected final static int testRestPort = getSystemProperty("tests.rest.port", DEFAULT_TEST_REST_PORT);
    protected final static boolean testKeepData = getSystemProperty("tests.leaveTemporary", false);

    protected static Elasticsearch elasticsearchWithSecurity;
    protected volatile static FsCrawlerManagementServiceElasticsearchImpl managementService;
    protected static FsCrawlerDocumentService documentService;

. . . . .

Bug 5. AbstractITCase.testClusterUrl isn't final and can't be protected from malicious code

Description

A mutable static field could be changed by malicious code or by accident from another package. Unfortunately, the way the field is used doesn't allow any easy fix to this problem.

Source

  path: integration-tests/.../AbstractITCase.java
AbstractITCase.java
.
.
.
    private final static String DEFAULT_USERNAME = "elastic";
    private final static String DEFAULT_PASSWORD = "changeme";
    private final static Integer DEFAULT_TEST_REST_PORT = 8080;

    protected static String testClusterUrl;
    protected final static String testClusterUser = getSystemProperty("tests.cluster.user", DEFAULT_USERNAME);
    protected final static String testClusterPass = getSystemProperty("tests.cluster.pass", DEFAULT_PASSWORD);
.
.

Solutions found

solution 1

 To fix this bug I think we could redefine the  assignment 
protected static String testClusterUrl;
in the assignment
protected volatile static String testClusterUrl;
AbstractITCase.java
.
.
.
    private final static String DEFAULT_USERNAME = "elastic";
    private final static String DEFAULT_PASSWORD = "changeme";
    private final static Integer DEFAULT_TEST_REST_PORT = 8080;

    protected volatile static String testClusterUrl;
    protected final static String testClusterUser = getSystemProperty("tests.cluster.user", DEFAULT_USERNAME);
    protected final static String testClusterPass = getSystemProperty("tests.cluster.pass", DEFAULT_PASSWORD);
.
.

. . . . .

Bug 6. AbstractRestITCase.target isn't final and can't be protected from malicious code

Description

A mutable static field could be changed by malicious code or by accident from another package. Unfortunately, the way the field is used doesn't allow any easy fix to this problem.

Source

  path: integration-tests/.../AbstractRestITCase.java
public abstract class AbstractRestITCase extends AbstractITCase {

    protected static WebTarget target; /* ERROR */
    protected static Client client;

    public static <T> T get(String path, Class<T> clazz) {
        if (staticLogger.isDebugEnabled()) {

Solutions found

solution 1

To fix this bug I think we could redefine the  assignment 
protected static WebTarget target; 
in the assignment
protected volatile static WebTarget target; 
public abstract class AbstractRestITCase extends AbstractITCase {

    protected  volatile static WebTarget target; 
    protected static Client client;

    public static <T> T get(String path, Class<T> clazz) {
        if (staticLogger.isDebugEnabled()) {

. . . . .

Bug 7. AbstractRestITCase.client isn't final and can't be protected from malicious code

Description

A mutable static field could be changed by malicious code or by accident from another package. Unfortunately, the way the field is used doesn't allow any easy fix to this problem.

Source

    path: integration-tests/.../AbstractRestITCase.java
public abstract class AbstractRestITCase extends AbstractITCase {

    protected static WebTarget target; 
    protected static Client client;  /* ERROR */

    public static <T> T get(String path, Class<T> clazz) {
        if (staticLogger.isDebugEnabled()) {

Solutions found

solution 1

To fix this bug I think we could redefine the  assignment 
 protected static Client client;
in the assignment
 protected volatile static Client client;
public abstract class AbstractRestITCase extends AbstractITCase {

    protected static WebTarget target; 
    protected volatile static Client client;  

    public static <T> T get(String path, Class<T> clazz) {
        if (staticLogger.isDebugEnabled()) {

. . . . .

Bug 8. testCustomTikaConfig() may fail to clean up java.io.InputStream on checked exception

Description

This method may fail to clean up (close, dispose of) a stream, database object, or other resource requiring an explicit cleanup operation.

Source

  path:  tika\...\TikaDocParserTest.java
    @Test
    public void testCustomTikaConfig() throws IOException, URISyntaxException {
        InputStream tikaConfigIS = getClass().getResourceAsStream("/config/tikaConfig.xml");
        Path testTikaConfig = rootTmpDir.resolve("tika-config");
        if (Files.notExists(testTikaConfig)) {
            Files.createDirectory(testTikaConfig);
        }
        Files.copy(tikaConfigIS, testTikaConfig.resolve("tikaConfig.xml"));

        FsSettings fsSettings = FsSettings.builder(getCurrentTestName())
            .setFs(Fs.builder().setTikaConfigPath(testTikaConfig.resolve("tikaConfig.xml").toString()).build())
            .build();

        // Test that default parser for HTML is HTML parser
        Doc doc = extractFromFile("test.html");
        assertThat(doc.getContent(), not(containsString("Test Tika title")));
        assertThat(doc.getContent(), containsString("This second part of the text is in Page 2"));

        // Test HTML parser is never used, TXT parser used instead
        doc = extractFromFile("test.html", fsSettings);
        assertThat(doc.getContent(), containsString("<title>Test Tika title</title>"));

        // Test that default parser for XHTML is HTML parser
        doc = extractFromFile("test.xhtml");
        assertThat(doc.getContent(), not(containsString("Test Tika title")));
        assertThat(doc.getContent(), containsString("This is an example of XHTML"));

        // Test XML parser is used to parse XHTML
        doc = extractFromFile("test.xhtml", fsSettings);
        assertThat(doc.getContent(), containsString("Test Tika title"));
        assertThat(doc.getContent(), not(containsString("<title>Test Tika title</title>")));
    }

Solutions found

solution 1

In general, if a method opens a stream or other resource, the method should use a try/finally block to ensure that the stream or resource is cleaned up before the method returns.
    @Test
    public void testCustomTikaConfig() throws IOException, URISyntaxException {

        InputStream tikaConfigIS = null;
        try {
            tikaConfigIS = getClass().getResourceAsStream("/config/tikaConfig.xml");
            Path testTikaConfig = rootTmpDir.resolve("tika-config");
            if (Files.notExists(testTikaConfig)) {
                Files.createDirectory(testTikaConfig);
            }
            Files.copy(tikaConfigIS, testTikaConfig.resolve("tikaConfig.xml"));

            FsSettings fsSettings = FsSettings.builder(getCurrentTestName())
                    .setFs(Fs.builder().setTikaConfigPath(testTikaConfig.resolve("tikaConfig.xml").toString()).build())
                    .build();

            // Test that default parser for HTML is HTML parser
            Doc doc = extractFromFile("test.html");
            assertThat(doc.getContent(), not(containsString("Test Tika title")));
            assertThat(doc.getContent(), containsString("This second part of the text is in Page 2"));

            // Test HTML parser is never used, TXT parser used instead
            doc = extractFromFile("test.html", fsSettings);
            assertThat(doc.getContent(), containsString("<title>Test Tika title</title>"));

            // Test that default parser for XHTML is HTML parser
            doc = extractFromFile("test.xhtml");
            assertThat(doc.getContent(), not(containsString("Test Tika title")));
            assertThat(doc.getContent(), containsString("This is an example of XHTML"));

            // Test XML parser is used to parse XHTML
            doc = extractFromFile("test.xhtml", fsSettings);
            assertThat(doc.getContent(), containsString("Test Tika title"));
            assertThat(doc.getContent(), not(containsString("<title>Test Tika title</title>")));

        } finally {
            if (tikaConfigIS != null) {
                tikaConfigIS.close();
            }
        }

    }
dadoonet commented 2 years ago

Thanks for your analysis and report. Would you like to open a PR for each of the issues you found?

ProcopieGabi0112 commented 2 years ago

Sure, I will try to do the PR this weekend.